From 3ce34b4ad34e75a8dacebf8d3958bb23be5ffdf2 Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Wed, 7 Jun 2023 10:01:32 -0400 Subject: Fixed missing ; for hex char A semicolon was missing on a not equal sign hex character. --- src/c-st-ext.adoc | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/src/c-st-ext.adoc b/src/c-st-ext.adoc index 0998b90..e193261 100644 --- a/src/c-st-ext.adoc +++ b/src/c-st-ext.adoc @@ -306,8 +306,7 @@ These instructions use the CI format. C.LWSP loads a 32-bit value from memory into register _rd_. It computes an effective address by adding the _zero_-extended offset, scaled by 4, to the stack pointer, `x2`. It expands to `lw rd, offset(x2)`. C.LWSP is -only valid when _rd_≠x0 the code -points with _rd_=x0 are reserved. +only valid when _rd_≠x0 the code points with _rd_=x0 are reserved. C.LDSP is an RV64C/RV128C-only instruction that loads a 64-bit value from memory into register _rd_. It computes its effective address by @@ -320,7 +319,7 @@ C.LQSP is an RV128C-only instruction that loads a 128-bit value from memory into register _rd_. It computes its effective address by adding the zero-extended offset, scaled by 16, to the stack pointer, `x2`. It expands to `lq rd, offset(x2)`. C.LQSP is only valid when -_rd_≠x0 the code points with +_rd_≠x0 the code points with _rd_=x0 are reserved. C.FLWSP is an RV32FC-only instruction that loads a single-precision -- cgit v1.1 From 9cd24d514bc52cb3863b8a547cd1ed40d07727ea Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Mon, 26 Jun 2023 11:39:03 -0400 Subject: Addition of Vector spec. Added Vector spec asciidoc to Ch. 22. Moved all heading levels up one. Added Vector appendices. Fixed erroneous heading levels caused by equal signs without spaces after them. --- src/riscv-unprivileged.adoc | 5 + src/v-st-ext.adoc | 5163 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 5168 insertions(+) diff --git a/src/riscv-unprivileged.adoc b/src/riscv-unprivileged.adoc index b89a44d..74c096a 100644 --- a/src/riscv-unprivileged.adoc +++ b/src/riscv-unprivileged.adoc @@ -143,6 +143,11 @@ include::mm-eplan.adoc[] //memory.tex include::mm-formal.adoc[] //end of memory.tex, memory-model-alloy.tex, memory-model-herd.tex +//Appendices for Vector +include::vector-examples.adoc[] +include::calling-convention.adoc[] +include::fraclmul.adoc[] +//End of Vector appendices include::index.adoc[] // this is generated generated from index markers. include::bibliography.adoc[] diff --git a/src/v-st-ext.adoc b/src/v-st-ext.adoc index 88dcf8d..e52bc59 100644 --- a/src/v-st-ext.adoc +++ b/src/v-st-ext.adoc @@ -12,3 +12,5166 @@ with later vector extensions supporting richer functionality for certain domains._ ==== +=== Introduction + +This document is version 1.1-draft of the RISC-V vector extension. + +NOTE: This version holds updates gathered after the start of the +public review. The spec will have a final update to version 2.0 at +time of ratification. + +This spec includes the complete set of currently frozen vector +instructions. Other instructions that have been considered during +development but are not present in this document are not included in +the review and ratification process, and may be completely revised or +abandoned. Section <> lists the standard +vector extensions and which instructions and element widths are +supported by each extension. + +=== Implementation-defined Constant Parameters + +Each hart supporting a vector extension defines two parameters: + +. The maximum size in bits of a vector element that any operation can produce or consume, _ELEN_ {ge} 8, which +must be a power of 2. +. The number of bits in a single vector register, _VLEN_ {ge} ELEN, which must be a power of 2, and must be no greater than 2^16^. + +Standard vector extensions (Section <>) and +architecture profiles may set further constraints on _ELEN_ and _VLEN_. + +NOTE: Future extensions may allow ELEN {gt} VLEN by holding one +element using bits from multiple vector registers, but this current +proposal does not include this option. + +NOTE: The upper limit on VLEN allows software to know that indices +will fit into 16 bits (largest VLMAX of 65,536 occurs for LMUL=8 and +SEW=8 with VLEN=65,536). Any future extension beyond 64Kib per vector +register will require new configuration instructions such that +software using the old configuration instructions does not see greater +vector lengths. + +The vector extension supports writing binary code that under certain +constraints will execute portably on harts with different values for +the VLEN parameter, provided the harts support the required element +types and instructions. + +NOTE: Code can be written that will expose differences in +implementation parameters. + +NOTE: In general, thread contexts with active vector state cannot be +migrated during execution between harts that have any difference in +VLEN or ELEN parameters. + +=== Vector Extension Programmer's Model + +The vector extension adds 32 vector registers, and seven unprivileged +CSRs (`vstart`, `vxsat`, `vxrm`, `vcsr`, `vtype`, `vl`, `vlenb`) to a +base scalar RISC-V ISA. + +.New vector CSRs +[cols="2,2,2,10"] +[%autowidth] +|=== +| Address | Privilege | Name | Description + +| 0x008 | URW | vstart | Vector start position +| 0x009 | URW | vxsat | Fixed-Point Saturate Flag +| 0x00A | URW | vxrm | Fixed-Point Rounding Mode +| 0x00F | URW | vcsr | Vector control and status register +| 0xC20 | URO | vl | Vector length +| 0xC21 | URO | vtype | Vector data type register +| 0xC22 | URO | vlenb | VLEN/8 (vector register length in bytes) +|=== + +NOTE: The four CSR numbers `0x00B`-`0x00E` are tentatively reserved +for future vector CSRs, some of which may be mirrored into `vcsr`. + +==== Vector Registers + +The vector extension adds 32 architectural vector registers, +`v0`-`v31` to the base scalar RISC-V ISA. + +Each vector register has a fixed VLEN bits of state. + +==== Vector Context Status in `mstatus` + +A vector context status field, `VS`, is added to `mstatus[10:9]` and shadowed +in `sstatus[10:9]`. It is defined analogously to the floating-point context +status field, `FS`. + +Attempts to execute any vector instruction, or to access the vector +CSRs, raise an illegal-instruction exception when `mstatus.VS` is +set to Off. + +When `mstatus.VS` is set to Initial or Clean, executing any +instruction that changes vector state, including the vector CSRs, will +change `mstatus.VS` to Dirty. +Implementations may also change `mstatus.VS` from Initial or Clean to Dirty +at any time, even when there is no change in vector state. + +NOTE: Accurate setting of `mstatus.VS` is an optimization. Software +will typically use VS to reduce context-swap overhead. + +If `mstatus.VS` is Dirty, `mstatus.SD` is 1; +otherwise, `mstatus.SD` is set in accordance with existing specifications. + +Implementations may have a writable `misa.V` field. Analogous to the +way in which the floating-point unit is handled, the `mstatus.VS` +field may exist even if `misa.V` is clear. + +NOTE: Allowing `mstatus.VS` to exist when `misa.V` is clear, enables +vector emulation and simplifies handling of `mstatus.VS` in systems +with writable `misa.V`. + +==== Vector Context Status in `vsstatus` + +When the hypervisor extension is present, a vector context status field, `VS`, +is added to `vsstatus[10:9]`. +It is defined analogously to the floating-point context status field, `FS`. + +When V=1, both `vsstatus.VS` and `mstatus.VS` are in effect: attempts to +execute any vector instruction, or to access the vector CSRs, raise an +illegal-instruction exception when either field is set to Off. + +When V=1 and neither `vsstatus.VS` nor `mstatus.VS` is set to Off, executing +any instruction that changes vector state, including the vector CSRs, will +change both `mstatus.VS` and `vsstatus.VS` to Dirty. +Implementations may also change `mstatus.VS` or `vsstatus.VS` from Initial or +Clean to Dirty at any time, even when there is no change in vector state. + +If `vsstatus.VS` is Dirty, `vsstatus.SD` is 1; +otherwise, `vsstatus.SD` is set in accordance with existing specifications. + +If `mstatus.VS` is Dirty, `mstatus.SD` is 1; +otherwise, `mstatus.SD` is set in accordance with existing specifications. + +For implementations with a writable `misa.V` field, +the `vsstatus.VS` field may exist even if `misa.V` is clear. + +==== Vector type register, `vtype` + +The read-only XLEN-wide _vector_ _type_ CSR, `vtype` provides the +default type used to interpret the contents of the vector register +file, and can only be updated by `vset{i}vl{i}` instructions. The +vector type determines the organization of elements in each +vector register, and how multiple vector registers are grouped. The +`vtype` register also indicates how masked-off elements and elements +past the current vector length in a vector result are handled. + +NOTE: Allowing updates only via the `vset{i}vl{i}` instructions +simplifies maintenance of the `vtype` register state. + +The `vtype` register has five fields, `vill`, `vma`, `vta`, +`vsew[2:0]`, and `vlmul[2:0]`. Bits `vtype[XLEN-2:8]` should be +written with zero, and non-zero values in this field are reserved. + +include::vtype-format.adoc[] + +NOTE: A small implementation supporting ELEN=32 requires only seven +bits of state in `vtype`: two bits for `ma` and `ta`, two bits for +`vsew[1:0]` and three bits for `vlmul[2:0]`. The illegal value +represented by `vill` can be internally encoded using the illegal 64-bit +combination in `vsew[1:0]` without requiring an additional storage +bit to hold `vill`. + +NOTE: Further standard and custom vector extensions may extend these +fields to support a greater variety of data types. + +NOTE: The primary motivation for the `vtype` CSR is to allow the +vector instruction set to fit into a 32-bit instruction encoding +space. A separate `vset{i}vl{i}` instruction can be used to set `vl` +and/or `vtype` fields before execution of a vector instruction, and +implementations may choose to fuse these two instructions into a single +internal vector microop. In many cases, the `vl` and `vtype` values +can be reused across multiple instructions, reducing the static and +dynamic instruction overhead from the `vset{i}vl{i}` instructions. It +is anticipated that a future extended 64-bit instruction encoding +would allow these fields to be specified statically in the instruction +encoding. + +===== Vector selected element width `vsew[2:0]` + +The value in `vsew` sets the dynamic _selected_ _element_ _width_ +(SEW). By default, a vector register is viewed as being divided into +VLEN/SEW elements. + +.vsew[2:0] (selected element width) encoding +[cols="1,1,1,1"] +[%autowidth] +|=== +3+| vsew[2:0] | SEW + +| 0 | 0 | 0 | 8 +| 0 | 0 | 1 | 16 +| 0 | 1 | 0 | 32 +| 0 | 1 | 1 | 64 +| 1 | X | X | Reserved +|=== + +NOTE: While it is anticipated the larger `vsew[2:0]` encodings +(`100`-`111`) will be used to encode larger SEW, the encodings are +formally _reserved_ at this point. + +.Example VLEN = 128 bits +[cols=">,>"] +[%autowidth] +|=== +| SEW | Elements per vector register + +| 64 | 2 +| 32 | 4 +| 16 | 8 +| 8 | 16 +|=== + +The supported element width may vary with LMUL. + +NOTE: The current set of standard vector extensions do not vary +supported element width with LMUL. Some future extensions may support +larger SEWs only when bits from multiple vector registers are combined +using LMUL. In this case, software that relies on large SEW should +attempt to use the largest LMUL, and hence the fewest vector register +groups, to increase the number of implementations on which the code +will run. The `vill` bit in `vtype` should be checked after setting +`vtype` to see if the configuration is supported, and an alternate +code path should be provided if it is not. Alternatively, a profile +can mandate the minimum SEW at each LMUL setting. + +===== Vector Register Grouping (`vlmul[2:0]`) + +Multiple vector registers can be grouped together, so that a single +vector instruction can operate on multiple vector registers. The term +_vector_ _register_ _group_ is used herein to refer to one or more +vector registers used as a single operand to a vector instruction. +Vector register groups can be used to provide greater execution +efficiency for longer application vectors, but the main reason for +their inclusion is to allow double-width or larger elements to be +operated on with the same vector length as single-width elements. The +vector length multiplier, _LMUL_, when greater than 1, represents the +default number of vector registers that are combined to form a vector +register group. Implementations must support LMUL integer values of +1, 2, 4, and 8. + + +NOTE: The vector architecture includes instructions that take multiple +source and destination vector operands with different element widths, +but the same number of elements. The effective LMUL (EMUL) of each +vector operand is determined by the number of registers required to +hold the elements. For example, for a widening add operation, such as +add 32-bit values to produce 64-bit results, a double-width result +requires twice the LMUL of the single-width inputs. + +LMUL can also be a fractional value, reducing the number of bits used +in a single vector register. Fractional LMUL is used to increase the +number of effective usable vector register groups when operating on +mixed-width values. + +NOTE: With only integer LMUL values, a loop operating on a range of +sizes would have to allocate at least one whole vector register +(LMUL=1) for the narrowest data type and then would consume multiple +vector registers (LMUL>1) to form a vector register group for each +wider vector operand. This can limit the number of vector register groups +available. With fractional LMUL, the widest values need occupy only a +single vector register while narrower values can occupy a fraction of +a single vector register, allowing all 32 architectural vector +register names to be used for different values in a vector loop even +when handling mixed-width values. Fractional LMUL implies portions of +vector registers are unused, but in some cases, having more shorter +register-resident vectors improves efficiency relative to fewer longer +register-resident vectors. + +Implementations must provide fractional LMUL settings that allow the +narrowest supported type to occupy a fraction of a vector register +corresponding to the ratio of the narrowest supported type's width to +that of the largest supported type's width. In general, the +requirement is to support LMUL {ge} SEW~MIN~/ELEN, where SEW~MIN~ is +the narrowest supported SEW value and ELEN is the widest supported SEW +value. In the standard extensions, SEW~MIN~=8. For +standard vector extensions with ELEN=32, fractional LMULs of 1/2 and +1/4 must be supported. For standard vector extensions with ELEN=64, +fractional LMULs of 1/2, 1/4, and 1/8 must be supported. + +NOTE: When LMUL < SEW~MIN~/ELEN, there is no guarantee +an implementation would have enough bits in the fractional vector +register to store at least one element, as VLEN=ELEN is a +valid implementation choice. For example, with VLEN=ELEN=32, +and SEW~MIN~=8, an LMUL of 1/8 would only provide four bits of +storage in a vector register. + +For a given supported fractional LMUL setting, implementations must support +SEW settings between SEW~MIN~ and LMUL * ELEN, inclusive. + +The use of `vtype` encodings with LMUL < SEW~MIN~/ELEN is +__reserved__, but implementations can set `vill` if they do not +support these configurations. + +NOTE: Requiring all implementations to set `vill` in this case would +prohibit future use of this case in an extension, so to allow for a +future definition of LMUL>. + +All systems must support all four options: + +[cols="1,1,3,3"] +[%autowidth] +|=== +| `vta` | `vma` | Tail Elements | Inactive Elements + +| 0 | 0 | undisturbed | undisturbed +| 0 | 1 | undisturbed | agnostic +| 1 | 0 | agnostic | undisturbed +| 1 | 1 | agnostic | agnostic +|=== + +Mask destination tail elements are always treated as tail-agnostic, +regardless of the setting of `vta`. + +When a set is marked undisturbed, the corresponding set of destination +elements in a vector register group retain the value they previously +held. + +When a set is marked agnostic, the corresponding set of destination +elements in any vector destination operand can either retain the value +they previously held, or are overwritten with 1s. Within a single vector +instruction, each destination element can be either left undisturbed +or overwritten with 1s, in any combination, and the pattern of +undisturbed or overwritten with 1s is not required to be deterministic +when the instruction is executed with the same inputs. + +NOTE: The agnostic policy was added to accommodate machines with +vector register renaming. With an undisturbed policy, all elements +would have to be read from the old physical destination vector +register to be copied into the new physical destination vector +register. This causes an inefficiency when these inactive or tail +values are not required for subsequent calculations. + +NOTE: The value of all 1s instead of all 0s was chosen for the +overwrite value to discourage software developers from depending on +the value written. + +NOTE: A simple in-order implementation can ignore the settings and +simply execute all vector instructions using the undisturbed +policy. The `vta` and `vma` state bits must still be provided in +`vtype` for compatibility and to support thread migration. + +NOTE: An out-of-order implementation can choose to implement +tail-agnostic + mask-agnostic using tail-agnostic + mask-undisturbed +to reduce implementation complexity. + +NOTE: The definition of agnostic result policy is left loose to +accommodate migrating application threads between harts on a small +in-order core (which probably leaves agnostic regions undisturbed) and +harts on a larger out-of-order core with register renaming (which +probably overwrites agnostic elements with 1s). As it might be +necessary to restart in the middle, we allow arbitrary mixing of +agnostic policies within a single vector instruction. This allowed +mixing of policies also enables implementations that might change +policies for different granules of a vector register, for example, +using undisturbed within a granule that is actively operated on but +renaming to all 1s for granules in the tail. + +In addition, except for mask load instructions, any element in the +tail of a mask result can also be written with the value the +mask-producing operation would have calculated with `vl`=VLMAX. +Furthermore, for mask-logical instructions and `vmsbf.m`, `vmsif.m`, +`vmsof.m` mask-manipulation instructions, any element in the tail of +the result can be written with the value the mask-producing operation +would have calculated with `vl`=VLEN, SEW=8, and LMUL=8 (i.e., all +bits of the mask register can be overwritten). + +NOTE: Mask tails are always treated as agnostic to reduce complexity +of managing mask data, which can be written at bit granularity. There +appears to be little software need to support tail-undisturbed for +mask register values. Allowing mask-generating instructions to write +back the result of the instruction avoids the need for logic to mask +out the tail, except mask loads cannot write memory values to +destination mask tails as this would imply accessing memory past +software intent. + +The assembly syntax adds two mandatory flags to the `vsetvli` instruction: + +---- + ta # Tail agnostic + tu # Tail undisturbed + ma # Mask agnostic + mu # Mask undisturbed + + vsetvli t0, a0, e32, m4, ta, ma # Tail agnostic, mask agnostic + vsetvli t0, a0, e32, m4, tu, ma # Tail undisturbed, mask agnostic + vsetvli t0, a0, e32, m4, ta, mu # Tail agnostic, mask undisturbed + vsetvli t0, a0, e32, m4, tu, mu # Tail undisturbed, mask undisturbed +---- + +NOTE: Prior to v0.9, when these flags were not specified on a +`vsetvli`, they defaulted to mask-undisturbed/tail-undisturbed. The +use of `vsetvli` without these flags is deprecated, however, and +specifying a flag setting is now mandatory. The default should +perhaps be tail-agnostic/mask-agnostic, so software has to specify +when it cares about the non-participating elements, but given the +historical meaning of the instruction prior to introduction of these +flags, it was decided to always require them in future assembly code. + +===== Vector Type Illegal `vill` + +The `vill` bit is used to encode that a previous `vset{i}vl{i}` +instruction attempted to write an unsupported value to `vtype`. + +NOTE: The `vill` bit is held in bit XLEN-1 of the CSR to support +checking for illegal values with a branch on the sign bit. + +If the `vill` bit is set, then any attempt to execute a vector instruction +that depends upon `vtype` will raise an illegal-instruction exception. + +NOTE: `vset{i}vl{i}` and whole register loads and stores do not depend +upon `vtype`. + +When the `vill` bit is set, the other XLEN-1 bits in `vtype` shall be +zero. + +==== Vector Length Register `vl` + +The _XLEN_-bit-wide read-only `vl` CSR can only be updated by the +`vset{i}vl{i}` instructions, and the _fault-only-first_ vector load +instruction variants. + +The `vl` register holds an unsigned integer specifying the number of +elements to be updated with results from a vector instruction, as +further detailed in Section <>. + +NOTE: The number of bits implemented in `vl` depends on the +implementation's maximum vector length of the smallest supported +type. The smallest vector implementation with VLEN=32 and supporting +SEW=8 would need at least six bits in `vl` to hold the values 0-32 +(VLEN=32, with LMUL=8 and SEW=8, yields VLMAX=32). + +==== Vector Byte Length `vlenb` + +The _XLEN_-bit-wide read-only CSR `vlenb` holds the value VLEN/8, +i.e., the vector register length in bytes. + +NOTE: The value in `vlenb` is a design-time constant in any +implementation. + +NOTE: Without this CSR, several instructions are needed to calculate +VLEN in bytes, and the code has to disturb current `vl` and `vtype` +settings which require them to be saved and restored. + +==== Vector Start Index CSR `vstart` + +The _XLEN_-bit-wide read-write `vstart` CSR specifies the index of the +first element to be executed by a vector instruction, as described in +Section <>. + +Normally, `vstart` is only written by hardware on a trap on a vector +instruction, with the `vstart` value representing the element on which +the trap was taken (either a synchronous exception or an asynchronous +interrupt), and at which execution should resume after a resumable +trap is handled. + +All vector instructions are defined to begin execution with the +element number given in the `vstart` CSR, leaving earlier elements in +the destination vector undisturbed, and to reset the `vstart` CSR to +zero at the end of execution. + +NOTE: All vector instructions, including `vset{i}vl{i}`, reset the `vstart` +CSR to zero. + +`vstart` is not modified by vector instructions that raise illegal-instruction +exceptions. + +The `vstart` CSR is defined to have only enough writable bits to hold +the largest element index (one less than the maximum VLMAX). + +NOTE: The maximum vector length is obtained with the largest LMUL +setting (8) and the smallest SEW setting (8), so VLMAX_max = 8*VLEN/8 = VLEN. For example, for VLEN=256, `vstart` would have 8 bits to +represent indices from 0 through 255. + +The use of `vstart` values greater than the largest element index for +the current `vtype` setting is reserved. + +NOTE: It is recommended that implementations trap if `vstart` is out +of bounds. It is not required to trap, as a possible future use of +upper `vstart` bits is to store imprecise trap information. + +The `vstart` CSR is writable by unprivileged code, but non-zero +`vstart` values may cause vector instructions to run substantially +slower on some implementations, so `vstart` should not be used by +application programmers. A few vector instructions cannot be +executed with a non-zero `vstart` value and will raise an illegal +instruction exception as defined below. + +NOTE: Making `vstart` visible to unprivileged code supports user-level +threading libraries. + +Implementations are permitted to raise illegal instruction exceptions when +attempting to execute a vector instruction with a value of `vstart` that the +implementation can never produce when executing that same instruction with +the same `vtype` setting. + +NOTE: For example, some implementations will never take interrupts during +execution of a vector arithmetic instruction, instead waiting until the +instruction completes to take the interrupt. Such implementations are +permitted to raise an illegal instruction exception when attempting to execute +a vector arithmetic instruction when `vstart` is nonzero. + +NOTE: When migrating a software thread between two harts with +different microarchitectures, the `vstart` value might not be +supported by the new hart microarchitecture. The runtime on the +receiving hart might then have to emulate instruction execution up to the +next supported `vstart` element position. Alternatively, migration events +can be constrained to only occur at mutually supported `vstart` +locations. + +==== Vector Fixed-Point Rounding Mode Register `vxrm` + +The vector fixed-point rounding-mode register holds a two-bit +read-write rounding-mode field in the least-significant bits +(`vxrm[1:0]`). The upper bits, `vxrm[XLEN-1:2]`, should be written as +zeros. + +The vector fixed-point rounding-mode is given a separate CSR address +to allow independent access, but is also reflected as a field in +`vcsr`. + +NOTE: A new rounding mode can be set while saving the original +rounding mode using a single `csrwi` instruction. + +The fixed-point rounding algorithm is specified as follows. +Suppose the pre-rounding result is `v`, and `d` bits of that result are to be +rounded off. +Then the rounded result is `(v >> d) + r`, where `r` depends on the rounding +mode as specified in the following table. + +.vxrm encoding +[cols="1,1,4,10,5"] +[%autowidth] +|=== +2+| `vxrm[1:0]` | Abbreviation | Rounding Mode | Rounding increment, `r` + +| 0 | 0 | rnu | round-to-nearest-up (add +0.5 LSB) | `v[d-1]` +| 0 | 1 | rne | round-to-nearest-even | `v[d-1] & (v[d-2:0]{ne}0 \| v[d])` +| 1 | 0 | rdn | round-down (truncate) | `0` +| 1 | 1 | rod | round-to-odd (OR bits into LSB, aka "jam") | `!v[d] & v[d-1:0]{ne}0` +|=== + +The rounding functions: +---- +roundoff_unsigned(v, d) = (unsigned(v) >> d) + r +roundoff_signed(v, d) = (signed(v) >> d) + r +---- +are used to represent this operation in the instruction descriptions below. + +==== Vector Fixed-Point Saturation Flag `vxsat` + +The `vxsat` CSR has a single read-write least-significant bit +(`vxsat[0]`) that indicates if a fixed-point instruction has had to +saturate an output value to fit into a destination format. +Bits `vxsat[XLEN-1:1]` should be written as zeros. + +The `vxsat` bit is mirrored in `vcsr`. + +==== Vector Control and Status Register `vcsr` + +The `vxrm` and `vxsat` separate CSRs can also be accessed via fields +in the _XLEN_-bit-wide vector control and status CSR, `vcsr`. + +.vcsr layout +[cols=">2,4,10"] +[%autowidth] +|=== +| Bits | Name | Description + +| XLEN-1:3 | | Reserved +| 2:1 | vxrm[1:0] | Fixed-point rounding mode +| 0 | vxsat | Fixed-point accrued saturation flag +|=== + +==== State of Vector Extension at Reset + +The vector extension must have a consistent state at reset. In +particular, `vtype` and `vl` must have values that can be read and +then restored with a single `vsetvl` instruction. + +NOTE: It is recommended that at reset, `vtype.vill` is set, the +remaining bits in `vtype` are zero, and `vl` is set to zero. + +The `vstart`, `vxrm`, `vxsat` CSRs can have arbitrary values at reset. + +NOTE: Most uses of the vector unit will require an initial `vset{i}vl{i}`, +which will reset `vstart`. The `vxrm` and `vxsat` fields should be +reset explicitly in software before use. + +The vector registers can have arbitrary values at reset. + +=== Mapping of Vector Elements to Vector Register State + +The following diagrams illustrate how different width elements are +packed into the bytes of a vector register depending on the current +SEW and LMUL settings, as well as implementation VLEN. Elements are +packed into each vector register with the least-significant byte in +the lowest-numbered bits. + +The mapping was chosen to provide the simplest and most portable model +for software, but might appear to incur large wiring cost for wider +vector datapaths on certain operations. The vector instruction set +was expressly designed to support implementations that internally +rearrange vector data for different SEW to reduce datapath wiring +costs, while externally preserving the simple software model. + +NOTE: For example, microarchitectures can track the EEW with which a +vector register was written, and then insert additional scrambling +operations to rearrange data if the register is accessed with a +different EEW. + +==== Mapping for LMUL = 1 + +When LMUL=1, elements are simply packed in order from the +least-significant to most-significant bits of the vector register. + +NOTE: To increase readability, vector register layouts are drawn with +bytes ordered from right to left with increasing byte address. Bits +within an element are numbered in a little-endian format with +increasing bit index from right to left corresponding to increasing +magnitude. + +---- +LMUL=1 examples. + +The element index is given in hexadecimal and is shown placed at the +least-significant byte of the stored element. + + + VLEN=32b + + Byte 3 2 1 0 + + SEW=8b 3 2 1 0 + SEW=16b 1 0 + SEW=32b 0 + + VLEN=64b + + Byte 7 6 5 4 3 2 1 0 + + SEW=8b 7 6 5 4 3 2 1 0 + SEW=16b 3 2 1 0 + SEW=32b 1 0 + SEW=64b 0 + + VLEN=128b + + Byte F E D C B A 9 8 7 6 5 4 3 2 1 0 + + SEW=8b F E D C B A 9 8 7 6 5 4 3 2 1 0 + SEW=16b 7 6 5 4 3 2 1 0 + SEW=32b 3 2 1 0 + SEW=64b 1 0 + + VLEN=256b + + Byte 1F1E1D1C1B1A19181716151413121110 F E D C B A 9 8 7 6 5 4 3 2 1 0 + + SEW=8b 1F1E1D1C1B1A19181716151413121110 F E D C B A 9 8 7 6 5 4 3 2 1 0 + SEW=16b F E D C B A 9 8 7 6 5 4 3 2 1 0 + SEW=32b 7 6 5 4 3 2 1 0 + SEW=64b 3 2 1 0 +---- + +==== Mapping for LMUL < 1 + +When LMUL < 1, only the first LMUL*VLEN/SEW elements in the vector +register are used. The remaining space in the vector register is +treated as part of the tail, and hence must obey the vta setting. + +---- + Example, VLEN=128b, LMUL=1/4 + + Byte F E D C B A 9 8 7 6 5 4 3 2 1 0 + + SEW=8b - - - - - - - - - - - - 3 2 1 0 + SEW=16b - - - - - - 1 0 + SEW=32b - - - 0 +---- + +==== Mapping for LMUL > 1 + +When vector registers are grouped, the elements of the vector register +group are packed contiguously in element order beginning with the +lowest-numbered vector register and moving to the +next-highest-numbered vector register in the group once each vector +register is filled. + +---- + LMUL > 1 examples + + VLEN=32b, SEW=8b, LMUL=2 + + Byte 3 2 1 0 + v2*n 3 2 1 0 + v2*n+1 7 6 5 4 + + VLEN=32b, SEW=16b, LMUL=2 + + Byte 3 2 1 0 + v2*n 1 0 + v2*n+1 3 2 + + VLEN=32b, SEW=16b, LMUL=4 + + Byte 3 2 1 0 + v4*n 1 0 + v4*n+1 3 2 + v4*n+2 5 4 + v4*n+3 7 6 + + VLEN=32b, SEW=32b, LMUL=4 + + Byte 3 2 1 0 + v4*n 0 + v4*n+1 1 + v4*n+2 2 + v4*n+3 3 + + VLEN=64b, SEW=32b, LMUL=2 + + Byte 7 6 5 4 3 2 1 0 + v2*n 1 0 + v2*n+1 3 2 + + VLEN=64b, SEW=32b, LMUL=4 + + Byte 7 6 5 4 3 2 1 0 + v4*n 1 0 + v4*n+1 3 2 + v4*n+2 5 4 + v4*n+3 7 6 + + VLEN=128b, SEW=32b, LMUL=2 + + Byte F E D C B A 9 8 7 6 5 4 3 2 1 0 + v2*n 3 2 1 0 + v2*n+1 7 6 5 4 + + VLEN=128b, SEW=32b, LMUL=4 + + Byte F E D C B A 9 8 7 6 5 4 3 2 1 0 + v4*n 3 2 1 0 + v4*n+1 7 6 5 4 + v4*n+2 B A 9 8 + v4*n+3 F E D C +---- + +[[sec-mapping-mixed]] +==== Mapping across Mixed-Width Operations + +The vector ISA is designed to support mixed-width operations without +requiring additional explicit rearrangement instructions. The +recommended software strategy when operating on multiple vectors with +different precision values is to modify `vtype` dynamically to keep +SEW/LMUL constant (and hence VLMAX constant). + +The following example shows four different packed element widths (8b, +16b, 32b, 64b) in a VLEN=128b implementation. The vector register +grouping factor (LMUL) is increased by the relative element size such +that each group can hold the same number of vector elements (VLMAX=8 +in this example) to simplify stripmining code. + +---- +Example VLEN=128b, with SEW/LMUL=16 + +Byte F E D C B A 9 8 7 6 5 4 3 2 1 0 +vn - - - - - - - - 7 6 5 4 3 2 1 0 SEW=8b, LMUL=1/2 + +vn 7 6 5 4 3 2 1 0 SEW=16b, LMUL=1 + +v2*n 3 2 1 0 SEW=32b, LMUL=2 +v2*n+1 7 6 5 4 + +v4*n 1 0 SEW=64b, LMUL=4 +v4*n+1 3 2 +v4*n+2 5 4 +v4*n+3 7 6 +---- + +The following table shows each possible constant SEW/LMUL operating +point for loops with mixed-width operations. Each column represents a +constant SEW/LMUL operating point. Entries in table are the LMUL +values that yield that column's SEW/LMUL value for the datawidth on +that row. In each column, an LMUL setting for a datawidth indicates +that it can be aligned with the other datawidths in the same column +that also have an LMUL setting, such that all have the same VLMAX. + +|=== +| 7+^| SEW/LMUL +| | 1 | 2 | 4 | 8 | 16 | 32 | 64 + +| SEW= 8 | 8 | 4 | 2 | 1 | 1/2 | 1/4 | 1/8 +| SEW= 16 | | 8 | 4 | 2 | 1 | 1/2 | 1/4 +| SEW= 32 | | | 8 | 4 | 2 | 1 | 1/2 +| SEW= 64 | | | | 8 | 4 | 2 | 1 +|=== + +Larger LMUL settings can also used to simply increase vector length to +reduce instruction fetch and dispatch overheads in cases where fewer +vector register groups are needed. + +[[sec-mask-register-layout]] +==== Mask Register Layout + +A vector mask occupies only one vector register regardless of SEW and +LMUL. + +Each element is allocated a single mask bit in a mask vector register. +The mask bit for element _i_ is located in bit _i_ of the mask +register, independent of SEW or LMUL. + +=== Vector Instruction Formats + +The instructions in the vector extension fit under two existing major +opcodes (LOAD-FP and STORE-FP) and one new major opcode (OP-V). + +Vector loads and stores are encoded within the scalar floating-point +load and store major opcodes (LOAD-FP/STORE-FP). The vector load and +store encodings repurpose a portion of the standard scalar +floating-point load/store 12-bit immediate field to provide further +vector instruction encoding, with bit 25 holding the standard vector +mask bit (see <>). + +include::vmem-format.adoc[] + +include::valu-format.adoc[] + +include::vcfg-format.adoc[] + +Vector instructions can have scalar or vector source operands and +produce scalar or vector results, and most vector instructions can be +performed either unconditionally or conditionally under a mask. + +Vector loads and stores move bit patterns between vector register +elements and memory. Vector arithmetic instructions operate on values +held in vector register elements. + +==== Scalar Operands + +Scalar operands can be immediates, or taken from the `x` registers, +the `f` registers, or element 0 of a vector register. Scalar results +are written to an `x` or `f` register or to element 0 of a vector +register. Any vector register can be used to hold a scalar regardless +of the current LMUL setting. + +NOTE: Zfinx ("F in X") is a proposed new ISA extension where +floating-point instructions take their arguments from the integer +register file. The vector extension is also compatible with Zfinx, +where the Zfinx vector extension has vector-scalar floating-point +instructions taking their scalar argument from the `x` registers. + +NOTE: We considered but did not pursue overlaying the `f` registers on +`v` registers. The adopted approach reduces vector register pressure, +avoids interactions with the standard calling convention, simplifies +high-performance scalar floating-point design, and provides +compatibility with the Zfinx ISA option. Overlaying `f` with `v` +would provide the advantage of lowering the number of state bits in +some implementations, but complicates high-performance designs and +would prevent compatibility with the proposed Zfinx ISA option. + +[[sec-vec-operands]] +==== Vector Operands + +Each vector operand has an _effective_ _element_ _width_ (EEW) and an +_effective_ LMUL (EMUL) that is used to determine the size and +location of all the elements within a vector register group. By +default, for most operands of most instructions, EEW=SEW and +EMUL=LMUL. + +Some vector instructions have source and destination vector operands +with the same number of elements but different widths, so that EEW and +EMUL differ from SEW and LMUL respectively but EEW/EMUL = SEW/LMUL. +For example, most widening arithmetic instructions have a source group +with EEW=SEW and EMUL=LMUL but have a destination group with EEW=2*SEW and +EMUL=2*LMUL. Narrowing instructions have a source operand that has +EEW=2*SEW and EMUL=2*LMUL but with a destination where EEW=SEW and EMUL=LMUL. + +Vector operands or results may occupy one or more vector registers +depending on EMUL, but are always specified using the lowest-numbered +vector register in the group. Using other than the lowest-numbered +vector register to specify a vector register group is a reserved +encoding. + +A vector register cannot be used to provide source operands with more +than one EEW for a single instruction. A mask register source is +considered to have EEW=1 for this constraint. An encoding that would +result in the same vector register being read with two or more +different EEWs, including when the vector register appears at +different positions within two or more vector register groups, is +reserved. + +NOTE: In practice, there is no software benefit to reading the same +register with different EEW in the same instruction, and this +constraint reduces complexity for implementations that internally +rearrange data dependent on EEW. + +A destination vector register group can overlap a source vector register +group only if one of the following holds: + +- The destination EEW equals the source EEW. +- The destination EEW is smaller than the source EEW and the overlap is in + the lowest-numbered part of the source register group (e.g., when LMUL=1, + `vnsrl.wi v0, v0, 3` is legal, but a destination of `v1` is not). +- The destination EEW is greater than the source EEW, the source EMUL is + at least 1, and the overlap is in the highest-numbered part of the + destination register group (e.g., when LMUL=8, `vzext.vf4 v0, v6` is legal, + but a source of `v0`, `v2`, or `v4` is not). + +For the purpose of determining register group overlap constraints, +mask elements have EEW=1. + +NOTE: The overlap constraints are designed to support resumable +exceptions in machines without register renaming. + +Any instruction encoding that violates the overlap constraints is reserved. + +The largest vector register group used by an instruction can not be +greater than 8 vector registers (i.e., EMUL{le}8), and if a vector +instruction would require greater than 8 vector registers in a group, +the instruction encoding is reserved. For example, a widening +operation that produces a widened vector register group result when +LMUL=8 is reserved as this would imply a result EMUL=16. + +Widened scalar values, e.g., input and output to a widening reduction +operation, are held in the first element of a vector register and +have EMUL=1. + +==== Vector Masking + +Masking is supported on many vector instructions. Element operations +that are masked off (inactive) never generate exceptions. The +destination vector register elements corresponding to masked-off +elements are handled with either a mask-undisturbed or mask-agnostic +policy depending on the setting of the `vma` bit in `vtype` (Section +<>). + +The mask value used to control execution of a masked vector +instruction is always supplied by vector register `v0`. + +NOTE: Masks are held in vector registers, rather than in a separate mask +register file, to reduce total architectural state and to simplify the ISA. + +NOTE: Future vector extensions may provide longer instruction +encodings with space for a full mask register specifier. + +The destination vector register group for a masked vector instruction +cannot overlap the source mask register (`v0`), unless the destination +vector register is being written with a mask value (e.g., compares) +or the scalar result of a reduction. These instruction encodings are +reserved. + +NOTE: This constraint supports restart with a non-zero `vstart` value. + +Other vector registers can be used to hold working mask values, and +mask vector logical operations are provided to perform predicate +calculations. [[sec-mask-vector-logical]] + +As specified in Section <>, mask destination values are +always treated as tail-agnostic, regardless of the setting of `vta`. + +[[sec-vector-mask-encoding]] +===== Mask Encoding + +Where available, masking is encoded in a single-bit `vm` field in the + instruction (`inst[25]`). + +[cols="1,15"] +|=== +| vm | Description + +| 0 | vector result, only where v0.mask[i] = 1 +| 1 | unmasked +|=== + +Vector masking is represented in assembler code as another vector +operand, with `.t` indicating that the operation occurs when +`v0.mask[i]` is `1` (`t` for "true"). If no masking operand is +specified, unmasked vector execution (`vm=1`) is assumed. + +---- + vop.v* v1, v2, v3, v0.t # enabled where v0.mask[i]=1, vm=0 + vop.v* v1, v2, v3 # unmasked vector operation, vm=1 +---- + +NOTE: Even though the current vector extensions only support one vector +mask register `v0` and only the true form of predication, the assembly +syntax writes it out in full to be compatible with future extensions +that might add a mask register specifier and support both true and +complement mask values. The `.t` suffix on the masking operand also helps +to visually encode the use of a mask. + +NOTE: The `.mask` suffix is not part of the assembly syntax. +We only append it in contexts where a mask vector is subscripted, +e.g., `v0.mask[i]`. + +[[sec-inactive-defs]] +==== Prestart, Active, Inactive, Body, and Tail Element Definitions + +The destination element indices operated on during a vector +instruction's execution can be divided into three disjoint subsets. + +* The _prestart_ elements are those whose element index is less than the +initial value in the `vstart` register. The prestart elements do not +raise exceptions and do not update the destination vector register. + +* The _body_ elements are those whose element index is greater than or equal +to the initial value in the `vstart` register, and less than the current +vector length setting in `vl`. The body can be split into two disjoint subsets: + +** The _active_ elements during a vector instruction's execution are the +elements within the body and where the current mask is enabled at that element +position. The active elements can raise exceptions and update the destination +vector register group. + +** The _inactive_ elements are the elements within the body +but where the current mask is disabled at that element +position. The inactive elements do not raise exceptions and do not +update any destination vector register group unless masked agnostic is +specified (`vtype.vma`=1), in which case inactive elements may be +overwritten with 1s. + +* The _tail_ elements during a vector instruction's execution are the +elements past the current vector length setting specified in `vl`. +The tail elements do not raise exceptions, and do not update any +destination vector register group unless tail agnostic is specified +(`vtype.vta`=1), in which case tail elements may be overwritten with +1s, or with the result of the instruction in the case of +mask-producing instructions except for mask loads. When LMUL < 1, the +tail includes the elements past VLMAX that are held in the same vector +register. + +---- + for element index x + prestart(x) = (0 <= x < vstart) + body(x) = (vstart <= x < vl) + tail(x) = (vl <= x < max(VLMAX,VLEN/SEW)) + mask(x) = unmasked || v0.mask[x] == 1 + active(x) = body(x) && mask(x) + inactive(x) = body(x) && !mask(x) +---- + +When `vstart` {ge} `vl`, there are no body elements, and no elements +are updated in any destination vector register group, including that +no tail elements are updated with agnostic values. + +NOTE: As a consequence, when `vl`=0, no elements, including agnostic +elements, are updated in the destination vector register group +regardless of `vstart`. + +Instructions that write an `x` register or `f` register +do so even when `vstart` {ge} `vl`, including when `vl`=0. + +NOTE: Some instructions such as `vslidedown` and `vrgather` may read +indices past `vl` or even VLMAX in source vector register groups. The +general policy is to return the value 0 when the index is greater than +VLMAX in the source vector register group. + +[[sec-vector-config]] +=== Configuration-Setting Instructions (`vsetvli`/`vsetivli`/`vsetvl`) + +One of the common approaches to handling a large number of elements is +"stripmining" where each iteration of a loop handles some number of elements, +and the iterations continue until all elements have been processed. The RISC-V +vector specification provides direct, portable support for this approach. +The application specifies the total number of elements to be processed (the application vector length or AVL) as a +candidate value for `vl`, and the hardware responds via a general-purpose +register with the (frequently smaller) number of elements that the hardware +will handle per iteration (stored in `vl`), based on the microarchitectural +implementation and the `vtype` setting. A straightforward loop structure, +shown in <>, depicts the ease with which the code keeps +track of the remaining number of elements and the amount per iteration handled +by hardware. + +A set of instructions is provided to allow rapid configuration of the +values in `vl` and `vtype` to match application needs. The +`vset{i}vl{i}` instructions set the `vtype` and `vl` CSRs based on +their arguments, and write the new value of `vl` into `rd`. + +---- + vsetvli rd, rs1, vtypei # rd = new vl, rs1 = AVL, vtypei = new vtype setting + vsetivli rd, uimm, vtypei # rd = new vl, uimm = AVL, vtypei = new vtype setting + vsetvl rd, rs1, rs2 # rd = new vl, rs1 = AVL, rs2 = new vtype value +---- + +include::vcfg-format.adoc[] + +==== `vtype` encoding + +include::vtype-format.adoc[] + +The new `vtype` value is encoded in the immediate fields of `vsetvli` +and `vsetivli`, and in the `rs2` register for `vsetvl`. + +---- + Suggested assembler names used for vset{i}vli vtypei immediate + + e8 # SEW=8b + e16 # SEW=16b + e32 # SEW=32b + e64 # SEW=64b + + mf8 # LMUL=1/8 + mf4 # LMUL=1/4 + mf2 # LMUL=1/2 + m1 # LMUL=1, assumed if m setting absent + m2 # LMUL=2 + m4 # LMUL=4 + m8 # LMUL=8 + +Examples: + vsetvli t0, a0, e8, ta, ma # SEW= 8, LMUL=1 + vsetvli t0, a0, e8, m2, ta, ma # SEW= 8, LMUL=2 + vsetvli t0, a0, e32, mf2, ta, ma # SEW=32, LMUL=1/2 +---- + +The `vsetvl` variant operates similarly to `vsetvli` except that it +takes a `vtype` value from `rs2` and can be used for context restore. + +===== Unsupported `vtype` Values + +If the `vtype` value is not supported by the implementation, then +the `vill` bit is set in `vtype`, the remaining bits in `vtype` are +set to zero, and the `vl` register is also set to zero. + +NOTE: Earlier drafts required a trap when setting `vtype` to an +illegal value. However, this would have added the first +data-dependent trap on a CSR write to the ISA. Implementations could +choose to trap when illegal values are written to `vtype` instead of +setting `vill`, to allow emulation to support new configurations for +forward-compatibility. The current scheme supports light-weight +runtime interrogation of the supported vector unit configurations by +checking if `vill` is clear for a given setting. + +A `vtype` value with `vill` set is treated as an unsupported +configuration. + +Implementations must consider all bits of the `vtype` value to +determine if the configuration is supported. An unsupported value in +any location within the `vtype` value must result in `vill` being set. + +NOTE: In particular, all XLEN bits of the register `vtype` argument to +the `vsetvl` instruction must be checked. Implementations cannot +ignore fields they do not implement. All bits must be checked to +ensure that new code assuming unsupported vector features in `vtype` +traps instead of executing incorrectly on an older implementation. + +==== AVL encoding + +The new vector +length setting is based on AVL, which for `vsetvli` and `vsetvl` is encoded in the `rs1` and `rd` +fields as follows: + +.AVL used in `vsetvli` and `vsetvl` instructions +[cols="2,2,10,10"] +[%autowidth] +|=== +| `rd` | `rs1` | AVL value | Effect on `vl` +| - | !x0 | Value in `x[rs1]` | Normal stripmining +| !x0 | x0 | ~0 | Set `vl` to VLMAX +| x0 | x0 | Value in `vl` register | Keep existing `vl` (of course, `vtype` may change) +|=== + +When `rs1` is not `x0`, the AVL is an unsigned integer held in the `x` +register specified by `rs1`, and the new `vl` value is also written to +the `x` register specified by `rd`. + +When `rs1=x0` but `rd!=x0`, the maximum unsigned integer value (`~0`) +is used as the AVL, and the resulting VLMAX is written to `vl` and +also to the `x` register specified by `rd`. + +When `rs1=x0` and `rd=x0`, the instruction operates as if the current +vector length in `vl` is used as the AVL, and the resulting value is +written to `vl`, but not to a destination register. This form can +only be used when VLMAX and hence `vl` is not actually changed by the +new SEW/LMUL ratio. Use of the instruction with a new SEW/LMUL ratio +that would result in a change of VLMAX is reserved. Implementations +may set `vill` in this case. + +NOTE: This last form of the instructions allows the `vtype` register to +be changed while maintaining the current `vl`, provided VLMAX is not +reduced. This design was chosen to ensure `vl` would always hold a +legal value for current `vtype` setting. The current `vl` value can +be read from the `vl` CSR. The `vl` value could be reduced by this +instruction if the new SEW/LMUL ratio causes VLMAX to shrink, and so +this case has been reserved as it is not clear this is a generally +useful operation, and implementations can otherwise assume `vl` is not +changed by this instruction to optimize their microarchitecture. + +For the `vsetivli` instruction, the AVL is encoded as a 5-bit +zero-extended immediate (0--31) in the `rs1` field. + +NOTE: The encoding of AVL for `vsetivli` is the same as for regular +CSR immediate values. + +NOTE: The `vsetivli` instruction provides more compact code when the +dimensions of vectors are small and known to fit inside the vector +registers, in which case there is no stripmining overhead. + +==== Constraints on Setting `vl` + +The `vset{i}vl{i}` instructions first set VLMAX according to their `vtype` +argument, then set `vl` obeying the following constraints: + +. `vl = AVL` if `AVL {le} VLMAX` +. `ceil(AVL / 2) {le} vl {le} VLMAX` if `AVL < (2 * VLMAX)` +. `vl = VLMAX` if `AVL {ge} (2 * VLMAX)` +. Deterministic on any given implementation for same input AVL and VLMAX values +. These specific properties follow from the prior rules: +.. `vl = 0` if `AVL = 0` +.. `vl > 0` if `AVL > 0` +.. `vl {le} VLMAX` +.. `vl {le} AVL` +.. a value read from `vl` when used as the AVL argument to `vset{i}vl{i}` results in the same +value in `vl`, provided the resultant VLMAX equals the value of VLMAX at the time that `vl` was read + +[NOTE] +-- +The `vl` setting rules are designed to be sufficiently strict to +preserve `vl` behavior across register spills and context swaps for +`AVL {le} VLMAX`, yet flexible enough to enable implementations to improve +vector lane utilization for `AVL > VLMAX`. + +For example, this permits an implementation to set `vl = ceil(AVL / 2)` +for `VLMAX < AVL < 2*VLMAX` in order to evenly distribute work over the +last two iterations of a stripmine loop. +Requirement 2 ensures that the first stripmine iteration of reduction +loops uses the largest vector length of all iterations, even in the case +of `AVL < 2*VLMAX`. +This allows software to avoid needing to explicitly calculate a running +maximum of vector lengths observed during a stripmined loop. +Requirement 2 also allows an implementation to set vl to VLMAX for `VLMAX < AVL < 2*VLMAX` +-- + +[[example-stripmine-sew]] +==== Example of stripmining and changes to SEW + +The SEW and LMUL settings can be changed dynamically to provide high +throughput on mixed-width operations in a single loop. +---- +# Example: Load 16-bit values, widen multiply to 32b, shift 32b result +# right by 3, store 32b values. +# On entry: +# a0 holds the total number of elements to process +# a1 holds the address of the source array +# a2 holds the address of the destination array + +loop: + vsetvli a3, a0, e16, m4, ta, ma # vtype = 16-bit integer vectors; + # also update a3 with vl (# of elements this iteration) + vle16.v v4, (a1) # Get 16b vector + slli t1, a3, 1 # Multiply # elements this iteration by 2 bytes/source element + add a1, a1, t1 # Bump pointer + vwmul.vx v8, v4, x10 # Widening multiply into 32b in + + vsetvli x0, x0, e32, m8, ta, ma # Operate on 32b values + vsrl.vi v8, v8, 3 + vse32.v v8, (a2) # Store vector of 32b elements + slli t1, a3, 2 # Multiply # elements this iteration by 4 bytes/destination element + add a2, a2, t1 # Bump pointer + sub a0, a0, a3 # Decrement count by vl + bnez a0, loop # Any more? +---- + +[[sec-vector-memory]] +=== Vector Loads and Stores + +Vector loads and stores move values between vector registers and +memory. +Vector loads and stores can be masked, and they only access memory or raise +exceptions for active elements. +Masked vector loads do not update inactive elements in the destination vector +register group, unless masked agnostic is specified (`vtype.vma`=1). +All vector loads and stores may +generate and accept a non-zero `vstart` value. + +==== Vector Load/Store Instruction Encoding + +Vector loads and stores are encoded within the scalar floating-point +load and store major opcodes (LOAD-FP/STORE-FP). The vector load and +store encodings repurpose a portion of the standard scalar +floating-point load/store 12-bit immediate field to provide further +vector instruction encoding, with bit 25 holding the standard vector +mask bit (see <>). + +include::vmem-format.adoc[] + +[cols="4,12"] +|=== +| Field | Description + +| rs1[4:0] | specifies x register holding base address +| rs2[4:0] | specifies x register holding stride +| vs2[4:0] | specifies v register holding address offsets +| vs3[4:0] | specifies v register holding store data +| vd[4:0] | specifies v register destination of load +| vm | specifies whether vector masking is enabled (0 = mask enabled, 1 = mask disabled) +| width[2:0] | specifies size of memory elements, and distinguishes from FP scalar +| mew | extended memory element width. See <> +| mop[1:0] | specifies memory addressing mode +| nf[2:0] | specifies the number of fields in each segment, for segment load/stores +| lumop[4:0]/sumop[4:0] | are additional fields encoding variants of unit-stride instructions +|=== + +Vector memory unit-stride and constant-stride operations directly +encode EEW of the data to be transferred statically in the instruction +to reduce the number of `vtype` changes when accessing memory in a +mixed-width routine. Indexed operations use the explicit EEW encoding +in the instruction to set the size of the indices used, and use +SEW/LMUL to specify the data width. + +==== Vector Load/Store Addressing Modes + +The vector extension supports unit-stride, strided, and +indexed (scatter/gather) addressing modes. Vector load/store base +registers and strides are taken from the GPR `x` registers. + +The base effective address for all vector accesses is given by the +contents of the `x` register named in `rs1`. + +Vector unit-stride operations access elements stored contiguously in +memory starting from the base effective address. + +Vector constant-strided operations access the first memory element at the base +effective address, and then access subsequent elements at address +increments given by the byte offset contained in the `x` register +specified by `rs2`. + +Vector indexed operations add the contents of each element of the +vector offset operand specified by `vs2` to the base effective address +to give the effective address of each element. The data vector +register group has EEW=SEW, EMUL=LMUL, while the offset vector +register group has EEW encoded in the instruction and +EMUL=(EEW/SEW)*LMUL. + +The vector offset operand is treated as a vector of byte-address +offsets. + +NOTE: The indexed operations can also be used to access fields within +a vector of objects, where the `vs2` vector holds pointers to the base +of the objects and the scalar `x` register holds the offset of the +member field in each object. Supporting this case is why the indexed +operations were not defined to scale the element indices by the data +EEW. + +If the vector offset elements are narrower than XLEN, they are +zero-extended to XLEN before adding to the base effective address. If +the vector offset elements are wider than XLEN, the least-significant +XLEN bits are used in the address calculation. An implementation must +raise an illegal instruction exception if the EEW is not supported for +offset elements. + +NOTE: A profile may place an upper limit on the maximum supported index +EEW (e.g., only up to XLEN) smaller than ELEN. + +The vector addressing modes are encoded using the 2-bit `mop[1:0]` +field. + +.encoding for loads +[cols="1,1,7,6"] +|=== +2+| mop [1:0] | Description | Opcodes + +| 0 | 0 | unit-stride | VLE +| 0 | 1 | indexed-unordered | VLUXEI +| 1 | 0 | strided | VLSE +| 1 | 1 | indexed-ordered | VLOXEI +|=== + +.encoding for stores +[cols="1,1,7,6"] +|=== +2+| mop [1:0] | Description | Opcodes + +| 0 | 0 | unit-stride | VSE +| 0 | 1 | indexed-unordered | VSUXEI +| 1 | 0 | strided | VSSE +| 1 | 1 | indexed-ordered | VSOXEI +|=== + +Vector unit-stride and constant-stride memory accesses do not +guarantee ordering between individual element accesses. The vector +indexed load and store memory operations have two forms, ordered and +unordered. The indexed-ordered variants preserve element ordering on +memory accesses. + +For unordered instructions (`mop[1:0]`!=11) there is no guarantee on +element access order. If the accesses are to a strongly ordered IO +region, the element accesses can be initiated in any order. + +NOTE: To provide ordered vector accesses to a strongly ordered IO +region, the ordered indexed instructions should be used. + +For implementations with precise vector traps, exceptions on +indexed-unordered stores must also be precise. + +Additional unit-stride vector addressing modes are encoded using the +5-bit `lumop` and `sumop` fields in the unit-stride load and store +instruction encodings respectively. + +.lumop +[cols="1,1,1,1,1,11"] +|=== +5+| lumop[4:0] | Description + +| 0 | 0 | 0 | 0 | 0 | unit-stride load +| 0 | 1 | 0 | 0 | 0 | unit-stride, whole register load +| 0 | 1 | 0 | 1 | 1 | unit-stride, mask load, EEW=8 +| 1 | 0 | 0 | 0 | 0 | unit-stride fault-only-first +| x | x | x | x | x | other encodings reserved +|=== + +.sumop +[cols="1,1,1,1,1,11"] +|=== +5+| sumop[4:0] | Description + +| 0 | 0 | 0 | 0 | 0 | unit-stride store +| 0 | 1 | 0 | 0 | 0 | unit-stride, whole register store +| 0 | 1 | 0 | 1 | 1 | unit-stride, mask store, EEW=8 +| x | x | x | x | x | other encodings reserved +|=== + +The `nf[2:0]` field encodes the number of fields in each segment. For +regular vector loads and stores, `nf`=0, indicating that a single +value is moved between a vector register group and memory at each +element position. Larger values in the `nf` field are used to access +multiple contiguous fields within a segment as described below in +Section <>. + +The `nf[2:0]` field also encodes the number of whole vector registers +to transfer for the whole vector register load/store instructions. + +[[sec-vector-loadstore-width-encoding]] +==== Vector Load/Store Width Encoding + +Vector loads and stores have an EEW encoded directly in the +instruction. The corresponding EMUL is calculated as EMUL = +(EEW/SEW)*LMUL. If the EMUL would be out of range (EMUL>8 or +EMUL<1/8), the instruction encoding is reserved. The vector register +groups must have legal register specifiers for the selected EMUL, +otherwise the instruction encoding is reserved. + +Vector unit-stride and constant-stride use the EEW/EMUL encoded in the +instruction for the data values, while vector indexed loads and stores +use the EEW/EMUL encoded in the instruction for the index values and +the SEW/LMUL encoded in `vtype` for the data values. + +Vector loads and stores are encoded using width values that are not +claimed by the standard scalar floating-point loads and stores. + +Implementations must provide vector loads and stores with EEWs +corresponding to all supported SEW settings. Vector load/store +encodings for unsupported EEW widths must raise an illegal +instruction exception. + +.Width encoding for vector loads and stores. +[cols="5,1,1,1,1,>3,>3,>3,3"] +|=== +| | mew 3+| width [2:0] | Mem bits | Data Reg bits | Index bits | Opcodes + +| Standard scalar FP | x | 0 | 0 | 1 | 16| FLEN | - | FLH/FSH +| Standard scalar FP | x | 0 | 1 | 0 | 32| FLEN | - | FLW/FSW +| Standard scalar FP | x | 0 | 1 | 1 | 64| FLEN | - | FLD/FSD +| Standard scalar FP | x | 1 | 0 | 0 | 128| FLEN | - | FLQ/FSQ +| Vector 8b element | 0 | 0 | 0 | 0 | 8| 8 | - | VLxE8/VSxE8 +| Vector 16b element | 0 | 1 | 0 | 1 | 16| 16 | - | VLxE16/VSxE16 +| Vector 32b element | 0 | 1 | 1 | 0 | 32| 32 | - | VLxE32/VSxE32 +| Vector 64b element | 0 | 1 | 1 | 1 | 64| 64 | - | VLxE64/VSxE64 +| Vector 8b index | 0 | 0 | 0 | 0 | SEW | SEW | 8 | VLxEI8/VSxEI8 +| Vector 16b index | 0 | 1 | 0 | 1 | SEW | SEW | 16 | VLxEI16/VSxEI16 +| Vector 32b index | 0 | 1 | 1 | 0 | SEW | SEW | 32 | VLxEI32/VSxEI32 +| Vector 64b index | 0 | 1 | 1 | 1 | SEW | SEW | 64 | VLxEI64/VSxEI64 +| Reserved | 1 | X | X | X | - | - | - | +|=== + +Mem bits is the size of each element accessed in memory. + +Data reg bits is the size of each data element accessed in register. + +Index bits is the size of each index accessed in register. + +The `mew` bit (`inst[28]`) when set is expected to be used to encode +expanded memory sizes of 128 bits and above, but these encodings are +currently reserved. + +==== Vector Unit-Stride Instructions + +---- + # Vector unit-stride loads and stores + + # vd destination, rs1 base address, vm is mask encoding (v0.t or ) + vle8.v vd, (rs1), vm # 8-bit unit-stride load + vle16.v vd, (rs1), vm # 16-bit unit-stride load + vle32.v vd, (rs1), vm # 32-bit unit-stride load + vle64.v vd, (rs1), vm # 64-bit unit-stride load + + # vs3 store data, rs1 base address, vm is mask encoding (v0.t or ) + vse8.v vs3, (rs1), vm # 8-bit unit-stride store + vse16.v vs3, (rs1), vm # 16-bit unit-stride store + vse32.v vs3, (rs1), vm # 32-bit unit-stride store + vse64.v vs3, (rs1), vm # 64-bit unit-stride store +---- + +Additional unit-stride mask load and store instructions are +provided to transfer mask values to/from memory. These +operate similarly to unmasked byte loads or stores (EEW=8), except that +the effective vector length is ``evl``=ceil(``vl``/8) (i.e. EMUL=1), +and the destination register is always written with a tail-agnostic +policy. + +---- + # Vector unit-stride mask load + vlm.v vd, (rs1) # Load byte vector of length ceil(vl/8) + + # Vector unit-stride mask store + vsm.v vs3, (rs1) # Store byte vector of length ceil(vl/8) +---- + +`vlm.v` and `vsm.v` are encoded with the same `width[2:0]`=0 encoding as +`vle8.v` and `vse8.v`, but are distinguished by different +`lumop` and `sumop` encodings. Since `vlm.v` and `vsm.v` operate as byte loads and stores, +`vstart` is in units of bytes for these instructions. + +NOTE: `vlm.v` and `vsm.v` respect the `vill` field in `vtype`, as +they depend on `vtype` indirectly through its constraints on `vl`. + +NOTE: The previous assembler mnemonics `vle1.v` and `vse1.v` were +confusing as length was handled differently for these instructions +versus other element load/store instructions. To avoid software +churn, these older assembly mnemonics are being retained as aliases. + +NOTE: The primary motivation to provide mask load and store is to +support machines that internally rearrange data to reduce +cross-datapath wiring. However, these instructions also provide a convenient +mechanism to use packed bit vectors in memory as mask values, +and also reduce the cost of mask spill/fill by reducing need to change +`vl`. + +==== Vector Strided Instructions + +---- + # Vector strided loads and stores + + # vd destination, rs1 base address, rs2 byte stride + vlse8.v vd, (rs1), rs2, vm # 8-bit strided load + vlse16.v vd, (rs1), rs2, vm # 16-bit strided load + vlse32.v vd, (rs1), rs2, vm # 32-bit strided load + vlse64.v vd, (rs1), rs2, vm # 64-bit strided load + + # vs3 store data, rs1 base address, rs2 byte stride + vsse8.v vs3, (rs1), rs2, vm # 8-bit strided store + vsse16.v vs3, (rs1), rs2, vm # 16-bit strided store + vsse32.v vs3, (rs1), rs2, vm # 32-bit strided store + vsse64.v vs3, (rs1), rs2, vm # 64-bit strided store +---- + +Negative and zero strides are supported. + +Element accesses within a strided instruction are unordered with +respect to each other. + +When `rs2`=`x0`, then an implementation is allowed, but not required, +to perform fewer memory operations than the number of active elements, +and may perform different numbers of memory operations across +different dynamic executions of the same static instruction. + +NOTE: Compilers must be aware to not use the `x0` form for rs2 when +the immediate stride is `0` if the intent is to require all memory +accesses are performed. + +When `rs2!=x0` and the value of `x[rs2]=0`, the implementation must +perform one memory access for each active element (but these accesses +will not be ordered). + +NOTE: As with other architectural mandates, implementations must +_appear_ to perform each memory access. Microarchitectures are +free to optimize away accesses that would not be observed by another +agent, for example, in idempotent memory regions obeying RVWMO. For +non-idempotent memory regions, where by definition each access can be +observed by a device, the optimization would not be possible. + +NOTE: When repeating ordered vector accesses to the same memory +address are required, then an ordered indexed operation can be used. + +==== Vector Indexed Instructions + +---- + # Vector indexed loads and stores + + # Vector indexed-unordered load instructions + # vd destination, rs1 base address, vs2 byte offsets + vluxei8.v vd, (rs1), vs2, vm # unordered 8-bit indexed load of SEW data + vluxei16.v vd, (rs1), vs2, vm # unordered 16-bit indexed load of SEW data + vluxei32.v vd, (rs1), vs2, vm # unordered 32-bit indexed load of SEW data + vluxei64.v vd, (rs1), vs2, vm # unordered 64-bit indexed load of SEW data + + # Vector indexed-ordered load instructions + # vd destination, rs1 base address, vs2 byte offsets + vloxei8.v vd, (rs1), vs2, vm # ordered 8-bit indexed load of SEW data + vloxei16.v vd, (rs1), vs2, vm # ordered 16-bit indexed load of SEW data + vloxei32.v vd, (rs1), vs2, vm # ordered 32-bit indexed load of SEW data + vloxei64.v vd, (rs1), vs2, vm # ordered 64-bit indexed load of SEW data + + # Vector indexed-unordered store instructions + # vs3 store data, rs1 base address, vs2 byte offsets + vsuxei8.v vs3, (rs1), vs2, vm # unordered 8-bit indexed store of SEW data + vsuxei16.v vs3, (rs1), vs2, vm # unordered 16-bit indexed store of SEW data + vsuxei32.v vs3, (rs1), vs2, vm # unordered 32-bit indexed store of SEW data + vsuxei64.v vs3, (rs1), vs2, vm # unordered 64-bit indexed store of SEW data + + # Vector indexed-ordered store instructions + # vs3 store data, rs1 base address, vs2 byte offsets + vsoxei8.v vs3, (rs1), vs2, vm # ordered 8-bit indexed store of SEW data + vsoxei16.v vs3, (rs1), vs2, vm # ordered 16-bit indexed store of SEW data + vsoxei32.v vs3, (rs1), vs2, vm # ordered 32-bit indexed store of SEW data + vsoxei64.v vs3, (rs1), vs2, vm # ordered 64-bit indexed store of SEW data + +---- + +NOTE: The assembler syntax for indexed loads and stores uses +``ei``__x__ instead of ``e``__x__ to indicate the statically encoded EEW +is of the index not the data. + +NOTE: The indexed operations mnemonics have a "U" or "O" to +distinguish between unordered and ordered, while the other vector +addressing modes have no character. While this is perhaps a little +less consistent, this approach minimizes disruption to existing +software, as VSXEI previously meant "ordered" - and the opcode can be +retained as an alias during transition to help reduce software churn. + +==== Unit-stride Fault-Only-First Loads + +The unit-stride fault-only-first load instructions are used to +vectorize loops with data-dependent exit conditions ("while" loops). +These instructions execute as a regular load except that they will +only take a trap caused by a synchronous exception on element 0. If +element 0 raises an exception, `vl` is not modified, and the trap is +taken. If an element > 0 raises an exception, the corresponding trap +is not taken, and the vector length `vl` is reduced to the index of +the element that would have raised an exception. + +Load instructions may overwrite active destination vector register +group elements past the element index at which the trap is reported. +Similarly, fault-only-first load instructions may update active destination +elements past the element that causes trimming of the vector length +(but not past the original vector length). The values of these +spurious updates do not have to correspond to the values in memory at +the addressed memory locations. Non-idempotent memory locations can +only be accessed when it is known the corresponding element load +operation will not be restarted due to a trap or vector-length +trimming. + +---- + # Vector unit-stride fault-only-first loads + + # vd destination, rs1 base address, vm is mask encoding (v0.t or ) + vle8ff.v vd, (rs1), vm # 8-bit unit-stride fault-only-first load + vle16ff.v vd, (rs1), vm # 16-bit unit-stride fault-only-first load + vle32ff.v vd, (rs1), vm # 32-bit unit-stride fault-only-first load + vle64ff.v vd, (rs1), vm # 64-bit unit-stride fault-only-first load +---- + +---- +strlen example using unit-stride fault-only-first instruction + +include::example/strlen.s[lines=4..-1] +---- + +NOTE: There is a security concern with fault-on-first loads, as they +can be used to probe for valid effective addresses. The unit-stride +versions only allow probing a region immediately contiguous to a known +region, and so reduce the security impact when used in unprivileged +code. However, code running in S-mode can establish arbitrary page +translations that allow probing of random guest physical addresses +provided by a hypervisor. Strided and scatter/gather fault-only-first +instructions are not provided due to lack of encoding space, but they +can also represent a larger security hole, allowing even unprivileged +software to easily check multiple random pages for accessibility +without experiencing a trap. This standard does not address possible +security mitigations for fault-only-first instructions. + +Even when an exception is not raised, implementations are permitted to process +fewer than `vl` elements and reduce `vl` accordingly, but if `vstart`=0 and +`vl`>0, then at least one element must be processed. + +When the fault-only-first instruction takes a trap due to an +interrupt, implementations should not reduce `vl` and should instead +set a `vstart` value. + +NOTE: When the fault-only-first instruction would trigger a debug +data-watchpoint trap on an element after the first, implementations +should not reduce `vl` but instead should trigger the debug trap as +otherwise the event might be lost. + +[[sec-aos]] +==== Vector Load/Store Segment Instructions + +The vector load/store segment instructions move multiple contiguous +fields in memory to and from consecutively numbered vector registers. + +NOTE: The name "segment" reflects that the items moved are subarrays +with homogeneous elements. These operations can be used to transpose +arrays between memory and registers, and can support operations on +"array-of-structures" datatypes by unpacking each field in a structure +into a separate vector register. + +The three-bit `nf` field in the vector instruction encoding is an +unsigned integer that contains one less than the number of fields per +segment, _NFIELDS_. + +[[fig-nf]] +.NFIELDS Encoding +[cols="1,1,1,13"] +|=== +3+| nf[2:0] | NFIELDS + +| 0 | 0 | 0 | 1 +| 0 | 0 | 1 | 2 +| 0 | 1 | 0 | 3 +| 0 | 1 | 1 | 4 +| 1 | 0 | 0 | 5 +| 1 | 0 | 1 | 6 +| 1 | 1 | 0 | 7 +| 1 | 1 | 1 | 8 +|=== + +The EMUL setting must be such that EMUL * NFIELDS {le} 8, otherwise +the instruction encoding is reserved. + +NOTE: The product ceil(EMUL) * NFIELDS represents the number of underlying +vector registers that will be touched by a segmented load or store +instruction. This constraint makes this total no larger than 1/4 of +the architectural register file, and the same as for regular +operations with EMUL=8. + +Each field will be held in successively numbered vector register +groups. When EMUL>1, each field will occupy a vector register group +held in multiple successively numbered vector registers, and the +vector register group for each field must follow the usual vector +register alignment constraints (e.g., when EMUL=2 and NFIELDS=4, each +field's vector register group must start at an even vector register, +but does not have to start at a multiple of 8 vector register number). + +If the vector register numbers accessed by the segment load or store +would increment past 31, then the instruction encoding is reserved. + +NOTE: This constraint is to help allow for forward-compatibility with +a possible future longer instruction encoding that has more +addressable vector registers. + +The `vl` register gives the number of segments to move, which is +equal to the number of elements transferred to each vector register +group. Masking is also applied at the level of whole segments. + +For segment loads and stores, the individual memory accesses used to +access fields within each segment are unordered with respect to each +other even for ordered indexed segment loads and stores. + +The `vstart` value is in units of whole segments. If a trap occurs during +access to a segment, it is implementation-defined whether a subset +of the faulting segment's accesses are performed before the trap is taken. + +===== Vector Unit-Stride Segment Loads and Stores + +The vector unit-stride load and store segment instructions move packed +contiguous segments into multiple destination vector register groups. + +NOTE: Where the segments hold structures with heterogeneous-sized +fields, software can later unpack individual structure fields using +additional instructions after the segment load brings data into the +vector registers. + +The assembler prefixes `vlseg`/`vsseg` are used for unit-stride +segment loads and stores respectively. + +---- + # Format + vlsege.v vd, (rs1), vm # Unit-stride segment load template + vssege.v vs3, (rs1), vm # Unit-stride segment store template + + # Examples + vlseg8e8.v vd, (rs1), vm # Load eight vector registers with eight byte fields. + + vsseg3e32.v vs3, (rs1), vm # Store packed vector of 3*4-byte segments from vs3,vs3+1,vs3+2 to memory +---- + +For loads, the `vd` register will hold the first field loaded from the +segment. For stores, the `vs3` register is read to provide the first +field to be stored to each segment. + +---- + # Example 1 + # Memory structure holds packed RGB pixels (24-bit data structure, 8bpp) + vsetvli a1, t0, e8, ta, ma + vlseg3e8.v v8, (a0), vm + # v8 holds the red pixels + # v9 holds the green pixels + # v10 holds the blue pixels + + # Example 2 + # Memory structure holds complex values, 32b for real and 32b for imaginary + vsetvli a1, t0, e32, ta, ma + vlseg2e32.v v8, (a0), vm + # v8 holds real + # v9 holds imaginary +---- + +There are also fault-only-first versions of the unit-stride instructions. + +---- + # Template for vector fault-only-first unit-stride segment loads. + vlsegeff.v vd, (rs1), vm # Unit-stride fault-only-first segment loads +---- + +For fault-only-first segment loads, if an exception is detected partway +through accessing a segment, regardless of whether the element index is zero, +it is implementation-defined whether a subset of the segment is loaded. + +These instructions may overwrite destination vector register group +elements past the point at which a trap is reported or past the point +at which vector length is trimmed. + +===== Vector Strided Segment Loads and Stores + +Vector strided segment loads and stores move contiguous segments where +each segment is separated by the byte-stride offset given in the `rs2` +GPR argument. + +NOTE: Negative and zero strides are supported. + +---- + # Format + vlssege.v vd, (rs1), rs2, vm # Strided segment loads + vsssege.v vs3, (rs1), rs2, vm # Strided segment stores + + # Examples + vsetvli a1, t0, e8, ta, ma + vlsseg3e8.v v4, (x5), x6 # Load bytes at addresses x5+i*x6 into v4[i], + # and bytes at addresses x5+i*x6+1 into v5[i], + # and bytes at addresses x5+i*x6+2 into v6[i]. + + # Examples + vsetvli a1, t0, e32, ta, ma + vssseg2e32.v v2, (x5), x6 # Store words from v2[i] to address x5+i*x6 + # and words from v3[i] to address x5+i*x6+4 +---- + +Accesses to the fields within each segment can occur in any order, +including the case where the byte stride is such that segments overlap +in memory. + +===== Vector Indexed Segment Loads and Stores + +Vector indexed segment loads and stores move contiguous segments where +each segment is located at an address given by adding the scalar base +address in the `rs1` field to byte offsets in vector register `vs2`. +Both ordered and unordered forms are provided, where the ordered forms +access segments in element order. However, even for the ordered form, +accesses to the fields within an individual segment are not ordered +with respect to each other. + +The data vector register group has EEW=SEW, EMUL=LMUL, while the index +vector register group has EEW encoded in the instruction with +EMUL=(EEW/SEW)*LMUL. +The EMUL * NFIELDS {le} 8 constraint applies to the data vector register group. + +---- + # Format + vluxsegei.v vd, (rs1), vs2, vm # Indexed-unordered segment loads + vloxsegei.v vd, (rs1), vs2, vm # Indexed-ordered segment loads + vsuxsegei.v vs3, (rs1), vs2, vm # Indexed-unordered segment stores + vsoxsegei.v vs3, (rs1), vs2, vm # Indexed-ordered segment stores + + # Examples + vsetvli a1, t0, e8, ta, ma + vluxseg3ei8.v v4, (x5), v3 # Load bytes at addresses x5+v3[i] into v4[i], + # and bytes at addresses x5+v3[i]+1 into v5[i], + # and bytes at addresses x5+v3[i]+2 into v6[i]. + + # Examples + vsetvli a1, t0, e32, ta, ma + vsuxseg2ei32.v v2, (x5), v5 # Store words from v2[i] to address x5+v5[i] + # and words from v3[i] to address x5+v5[i]+4 +---- + +For vector indexed segment loads, the destination vector register +groups cannot overlap the source vector register group (specified by +`vs2`), else the instruction encoding is reserved. + +NOTE: This constraint supports restart of indexed segment loads +that raise exceptions partway through loading a structure. + +==== Vector Load/Store Whole Register Instructions + +Format for Vector Load Whole Register Instructions under LOAD-FP major opcode + +//// +31 29 28 27 26 25 24 20 19 15 14 12 11 7 6 0 + nf | mew| 00 | 1| 01000 | rs1 | width | vd |0000111| VLR +//// + +```wavedrom +{reg: [ + {bits: 7, name: 0x07, attr: 'VL*R*'}, + {bits: 5, name: 'vd', attr: 'destination of load', type: 2}, + {bits: 3, name: 'width'}, + {bits: 5, name: 'rs1', attr: 'base address', type: 4}, + {bits: 5, name: 8, attr: 'lumop'}, + {bits: 1, name: 1, attr: 'vm'}, + {bits: 2, name: 0x10000, attr: 'mop'}, + {bits: 1, name: 'mew'}, + {bits: 3, name: 'nf'}, +]} +``` + +Format for Vector Store Whole Register Instructions under STORE-FP major opcode + +//// +31 29 28 27 26 25 24 20 19 15 14 12 11 7 6 0 + nf | 0 | 00 | 1| 01000 | rs1 | 000 | vs3 |0100111| VSR +//// + +```wavedrom +{reg: [ + {bits: 7, name: 0x27, attr: 'VS*R*'}, + {bits: 5, name: 'vs3', attr: 'store data', type: 2}, + {bits: 3, name: 0x1000}, + {bits: 5, name: 'rs1', attr: 'base address', type: 4}, + {bits: 5, name: 8, attr: 'sumop'}, + {bits: 1, name: 1, attr: 'vm'}, + {bits: 2, name: 0x100, attr: 'mop'}, + {bits: 1, name: 0x100, attr: 'mew'}, + {bits: 3, name: 'nf'}, +]} +``` + +These instructions load and store whole vector register groups. + +NOTE: These instructions are intended to be used to save and restore +vector registers when the type or length of the current contents of +the vector register is not known, or where modifying `vl` and `vtype` +would be costly. Examples include compiler register spills, vector +function calls where values are passed in vector registers, interrupt +handlers, and OS context switches. Software can determine the number +of bytes transferred by reading the `vlenb` register. + +The load instructions have an EEW encoded in the `mew` and `width` +fields following the pattern of regular unit-stride loads. + +NOTE: Because in-register byte layouts are identical to in-memory byte +layouts, the same data is written to the destination register group +regardless of EEW. +Hence, it would have sufficed to provide only EEW=8 variants. +The full set of EEW variants is provided so that the encoded EEW can be used +as a hint to indicate the destination register group will next be accessed +with this EEW, which aids implementations that rearrange data internally. + +The vector whole register store instructions are encoded similar to +unmasked unit-stride store of elements with EEW=8. + +The `nf` field encodes how many vector registers to load and store using the NFIELDS encoding (Figure <>). +The encoded number of registers must be a power of 2 and the vector +register numbers must be aligned as with a vector register group, +otherwise the instruction encoding is reserved. NFIELDS +indicates the number of vector registers to transfer, numbered +successively after the base. Only NFIELDS values of 1, 2, 4, 8 are +supported, with other values reserved. When multiple registers are +transferred, the lowest-numbered vector register is held in the +lowest-numbered memory addresses and successive vector register +numbers are placed contiguously in memory. + +The instructions operate with an effective vector length, +`evl`=NFIELDS*VLEN/EEW, regardless of current settings in `vtype` and +`vl`. The usual property that no elements are written if `vstart` +{ge} `vl` does not apply to these instructions. Instead, no elements +are written if `vstart` {ge} `evl`. + +The instructions operate similarly to unmasked unit-stride load and +store instructions, with the base address passed in the scalar `x` +register specified by `rs1`. + +Implementations are allowed to raise a misaligned address exception on +whole register loads and stores if the base address is not naturally +aligned to the larger of the size of the encoded EEW in bytes (EEW/8) +or the implementation's smallest supported SEW size in bytes +(SEW~MIN~/8). + +NOTE: Allowing misaligned exceptions to be raised based on +non-alignment to the encoded EEW simplifies the implementation of these +instructions. Some subset implementations might not support smaller +SEW widths, so are allowed to report misaligned exceptions for the +smallest supported SEW even if larger than encoded EEW. An extreme +non-standard implementation might have SEW~MIN~>XLEN for example. Software +environments can mandate the minimum alignment requirements to support +an ABI. + +---- + # Format of whole register load and store instructions. + vl1r.v v3, (a0) # Pseudoinstruction equal to vl1re8.v + + vl1re8.v v3, (a0) # Load v3 with VLEN/8 bytes held at address in a0 + vl1re16.v v3, (a0) # Load v3 with VLEN/16 halfwords held at address in a0 + vl1re32.v v3, (a0) # Load v3 with VLEN/32 words held at address in a0 + vl1re64.v v3, (a0) # Load v3 with VLEN/64 doublewords held at address in a0 + + vl2r.v v2, (a0) # Pseudoinstruction equal to vl2re8.v + + vl2re8.v v2, (a0) # Load v2-v3 with 2*VLEN/8 bytes from address in a0 + vl2re16.v v2, (a0) # Load v2-v3 with 2*VLEN/16 halfwords held at address in a0 + vl2re32.v v2, (a0) # Load v2-v3 with 2*VLEN/32 words held at address in a0 + vl2re64.v v2, (a0) # Load v2-v3 with 2*VLEN/64 doublewords held at address in a0 + + vl4r.v v4, (a0) # Pseudoinstruction equal to vl4re8.v + + vl4re8.v v4, (a0) # Load v4-v7 with 4*VLEN/8 bytes from address in a0 + vl4re16.v v4, (a0) + vl4re32.v v4, (a0) + vl4re64.v v4, (a0) + + vl8r.v v8, (a0) # Pseudoinstruction equal to vl8re8.v + + vl8re8.v v8, (a0) # Load v8-v15 with 8*VLEN/8 bytes from address in a0 + vl8re16.v v8, (a0) + vl8re32.v v8, (a0) + vl8re64.v v8, (a0) + + vs1r.v v3, (a1) # Store v3 to address in a1 + vs2r.v v2, (a1) # Store v2-v3 to address in a1 + vs4r.v v4, (a1) # Store v4-v7 to address in a1 + vs8r.v v8, (a1) # Store v8-v15 to address in a1 +---- + +NOTE: Implementations should raise illegal instruction exceptions on +`vlr` instructions for EEW values that are not supported. + +NOTE: We have considered adding a whole register mask load instruction +(`vl1rm.v`) but have decided to omit from initial extension. The +primary purpose would be to inform the microarchitecture that the data +will be used as a mask. The same effect can be achieved with the +following code sequence, whose cost is at most four instructions. Of +these, the first could likely be removed as `vl` is often already +in a scalar register, and the last might already be present if the +following vector instruction needs a new SEW/LMUL. So, in best case +only two instructions (of which only one performs vector operations) are needed to synthesize the effect of the +dedicated instruction: +---- + csrr t0, vl # Save current vl (potentially not needed) + vsetvli t1, x0, e8, m8, ta, ma # Maximum VLMAX + vlm.v v0, (a0) # Load mask register + vsetvli x0, t0, # Restore vl (potentially already present) +---- + +=== Vector Memory Alignment Constraints + +If an element accessed by a vector memory instruction is not naturally +aligned to the size of the element, either the element is transferred +successfully or an address misaligned exception is raised on that +element. + +Support for misaligned vector memory accesses is independent of an +implementation's support for misaligned scalar memory accesses. + +NOTE: An implementation may have neither, one, or both scalar and +vector memory accesses support some or all misaligned accesses in +hardware. A separate PMA should be defined to determine if vector +misaligned accesses are supported in the associated address range. + +Vector misaligned memory accesses follow the same rules for atomicity +as scalar misaligned memory accesses. + +=== Vector Memory Consistency Model + +Vector memory instructions appear to execute in program order on the +local hart. + +Vector memory instructions follow RVWMO at the instruction level. +If the Ztso extension is implemented, vector memory instructions additionally +follow RVTSO at the instruction level. + +Except for vector indexed-ordered loads and stores, element operations +are unordered within the instruction. + +Vector indexed-ordered loads and stores read and write elements +from/to memory in element order respectively, +obeying RVWMO at the element level. + +NOTE: Ztso only imposes RVTSO at the instruction level; intra-instruction +ordering follows RVWMO regardless of whether Ztso is implemented. + +NOTE: More formal definitions required. + +Instructions affected by the vector length register `vl` have a control +dependency on `vl`, rather than a data dependency. +Similarly, masked vector instructions have a control dependency on the source +mask register, rather than a data dependency. + +NOTE: Treating the vector length and mask as control rather than data +typically matches the semantics of the corresponding scalar code, where branch +instructions ordinarily would have been used. +Treating the mask as control allows masked vector load instructions to access +memory before the mask value is known, without the need for +a misspeculation-recovery mechanism. + +=== Vector Arithmetic Instruction Formats + +The vector arithmetic instructions use a new major opcode (OP-V = +1010111~2~) which neighbors OP-FP. The three-bit `funct3` field is +used to define sub-categories of vector instructions. + +include::valu-format.adoc[] + +[[sec-arithmetic-encoding]] +==== Vector Arithmetic Instruction encoding + +The `funct3` field encodes the operand type and source locations. + +.funct3 +[cols="1,1,1,3,5,5"] +|=== +3+| funct3[2:0] | Category | Operands | Type of scalar operand + +| 0 | 0 | 0 | OPIVV | vector-vector | N/A +| 0 | 0 | 1 | OPFVV | vector-vector | N/A +| 0 | 1 | 0 | OPMVV | vector-vector | N/A +| 0 | 1 | 1 | OPIVI | vector-immediate | `imm[4:0]` +| 1 | 0 | 0 | OPIVX | vector-scalar | GPR `x` register `rs1` +| 1 | 0 | 1 | OPFVF | vector-scalar | FP `f` register `rs1` +| 1 | 1 | 0 | OPMVX | vector-scalar | GPR `x` register `rs1` +| 1 | 1 | 1 | OPCFG | scalars-imms | GPR `x` register `rs1` & `rs2`/`imm` +|=== + +Integer operations are performed using unsigned or two's-complement +signed integer arithmetic depending on the opcode. + +NOTE: In this discussion, fixed-point operations are +considered to be integer operations. + +All standard vector floating-point arithmetic operations follow the +IEEE-754/2008 standard. All vector floating-point operations use the +dynamic rounding mode in the `frm` register. Use of the `frm` field +when it contains an invalid rounding mode by any vector floating-point +instruction--even those that do not depend on the rounding mode, or +when `vl`=0, or when `vstart` {ge} `vl`--is reserved. + +NOTE: All vector floating-point code will rely on a valid value in +`frm`. Implementations can make all vector FP instructions report +exceptions when the rounding mode is invalid to simplify control +logic. + +Vector-vector operations take two vectors of operands from vector +register groups specified by `vs2` and `vs1` respectively. + +Vector-scalar operations can have three possible forms. In all three forms, +the vector register group operand is specified by `vs2`. The second +scalar source operand comes from one of three alternative sources: + +. For integer operations, the scalar can be a 5-bit immediate, `imm[4:0]`, encoded +in the `rs1` field. The value is sign-extended to SEW bits, unless +otherwise specified. + +. For integer operations, the scalar can be taken from the scalar `x` +register specified by `rs1`. If XLEN>SEW, the least-significant SEW +bits of the `x` register are used, unless otherwise specified. If +XLEN SEW, the value in the `f` registers is +checked for a valid NaN-boxed value, in which case the +least-significant SEW bits of the `f` register are used, else the +canonical NaN value is used. Vector instructions where any +floating-point vector operand's EEW is not a supported floating-point +type width (which includes when FLEN < SEW) are reserved. + +NOTE: Some instructions _zero_-extend the 5-bit immediate, and denote this +by naming the immediate `uimm` in the assembly syntax. + +NOTE: When adding a vector extension to the proposed Zfinx/Zdinx/Zhinx +extensions, floating-point scalar arguments are taken from the `x` +registers. NaN-boxing is not supported in these extensions, and so +the vector floating-point scalar value is produced using the same +rules as for an integer scalar operand (i.e., when XLEN > SEW use the +lowest SEW bits, when XLEN < SEW use the sign-extended value). + +Vector arithmetic instructions are masked under control of the `vm` +field. + +---- +# Assembly syntax pattern for vector binary arithmetic instructions + +# Operations returning vector results, masked by vm (v0.t, ) +vop.vv vd, vs2, vs1, vm # integer vector-vector vd[i] = vs2[i] op vs1[i] +vop.vx vd, vs2, rs1, vm # integer vector-scalar vd[i] = vs2[i] op x[rs1] +vop.vi vd, vs2, imm, vm # integer vector-immediate vd[i] = vs2[i] op imm + +vfop.vv vd, vs2, vs1, vm # FP vector-vector operation vd[i] = vs2[i] fop vs1[i] +vfop.vf vd, vs2, rs1, vm # FP vector-scalar operation vd[i] = vs2[i] fop f[rs1] +---- + +NOTE: In the encoding, `vs2` is the first operand, while `rs1/imm` +is the second operand. This is the opposite to the standard scalar +ordering. This arrangement retains the existing encoding conventions +that instructions that read only one scalar register, read it from +`rs1`, and that 5-bit immediates are sourced from the `rs1` field. + +---- +# Assembly syntax pattern for vector ternary arithmetic instructions (multiply-add) + +# Integer operations overwriting sum input +vop.vv vd, vs1, vs2, vm # vd[i] = vs1[i] * vs2[i] + vd[i] +vop.vx vd, rs1, vs2, vm # vd[i] = x[rs1] * vs2[i] + vd[i] + +# Integer operations overwriting product input +vop.vv vd, vs1, vs2, vm # vd[i] = vs1[i] * vd[i] + vs2[i] +vop.vx vd, rs1, vs2, vm # vd[i] = x[rs1] * vd[i] + vs2[i] + +# Floating-point operations overwriting sum input +vfop.vv vd, vs1, vs2, vm # vd[i] = vs1[i] * vs2[i] + vd[i] +vfop.vf vd, rs1, vs2, vm # vd[i] = f[rs1] * vs2[i] + vd[i] + +# Floating-point operations overwriting product input +vfop.vv vd, vs1, vs2, vm # vd[i] = vs1[i] * vd[i] + vs2[i] +vfop.vf vd, rs1, vs2, vm # vd[i] = f[rs1] * vd[i] + vs2[i] +---- + +NOTE: For ternary multiply-add operations, the assembler syntax always +places the destination vector register first, followed by either `rs1` +or `vs1`, then `vs2`. This ordering provides a more natural reading +of the assembler for these ternary operations, as the multiply +operands are always next to each other. + +[[sec-widening]] +==== Widening Vector Arithmetic Instructions + +A few vector arithmetic instructions are defined to be __widening__ +operations where the destination vector register group has EEW=2*SEW +and EMUL=2*LMUL. These are generally given a `vw*` prefix on the +opcode, or `vfw*` for vector floating-point instructions. + +The first vector register group operand can be either single or +double-width. + +---- +Assembly syntax pattern for vector widening arithmetic instructions + +# Double-width result, two single-width sources: 2*SEW = SEW op SEW +vwop.vv vd, vs2, vs1, vm # integer vector-vector vd[i] = vs2[i] op vs1[i] +vwop.vx vd, vs2, rs1, vm # integer vector-scalar vd[i] = vs2[i] op x[rs1] + +# Double-width result, first source double-width, second source single-width: 2*SEW = 2*SEW op SEW +vwop.wv vd, vs2, vs1, vm # integer vector-vector vd[i] = vs2[i] op vs1[i] +vwop.wx vd, vs2, rs1, vm # integer vector-scalar vd[i] = vs2[i] op x[rs1] +---- + +NOTE: Originally, a `w` suffix was used on opcode, but this could be +confused with the use of a `w` suffix to mean word-sized operations in +doubleword integers, so the `w` was moved to prefix. + +NOTE: The floating-point widening operations were changed to `vfw*` +from `vwf*` to be more consistent with any scalar widening +floating-point operations that will be written as `fw*`. + +Widening instruction encodings must follow the constraints in Section +<>. + +[[sec-narrowing]] +==== Narrowing Vector Arithmetic Instructions + +A few instructions are provided to convert double-width source vectors +into single-width destination vectors. These instructions convert a +vector register group specified by `vs2` with EEW/EMUL=2*SEW/2*LMUL to a vector register +group with the current SEW/LMUL setting. Where there is a second +source vector register group (specified by `vs1`), this has the same +(narrower) width as the result (i.e., EEW=SEW). + +NOTE: An alternative design decision would have been to treat SEW/LMUL +as defining the size of the source vector register group. The choice +here is motivated by the belief the chosen approach will require fewer +`vtype` changes. + +NOTE: Compare operations that set a mask register are also +implicitly a narrowing operation. + +A `vn*` prefix on the opcode is used to distinguish these instructions +in the assembler, or a `vfn*` prefix for narrowing floating-point +opcodes. The double-width source vector register group is signified +by a `w` in the source operand suffix (e.g., `vnsra.wv`) + +---- +Assembly syntax pattern for vector narrowing arithmetic instructions + +# Single-width result vd, double-width source vs2, single-width source vs1/rs1 +# SEW = 2*SEW op SEW +vnop.wv vd, vs2, vs1, vm # integer vector-vector vd[i] = vs2[i] op vs1[i] +vnop.wx vd, vs2, rs1, vm # integer vector-scalar vd[i] = vs2[i] op x[rs1] +---- + +Narrowing instruction encodings must follow the constraints in Section +<>. + +[[sec-vector-integer]] +=== Vector Integer Arithmetic Instructions + +A set of vector integer arithmetic instructions is provided. Unless +otherwise stated, integer operations wrap around on overflow. + +==== Vector Single-Width Integer Add and Subtract + +Vector integer add and subtract are provided. Reverse-subtract +instructions are also provided for the vector-scalar forms. + +---- +# Integer adds. +vadd.vv vd, vs2, vs1, vm # Vector-vector +vadd.vx vd, vs2, rs1, vm # vector-scalar +vadd.vi vd, vs2, imm, vm # vector-immediate + +# Integer subtract +vsub.vv vd, vs2, vs1, vm # Vector-vector +vsub.vx vd, vs2, rs1, vm # vector-scalar + +# Integer reverse subtract +vrsub.vx vd, vs2, rs1, vm # vd[i] = x[rs1] - vs2[i] +vrsub.vi vd, vs2, imm, vm # vd[i] = imm - vs2[i] +---- + +NOTE: A vector of integer values can be negated using a +reverse-subtract instruction with a scalar operand of `x0`. An +assembly pseudoinstruction `vneg.v vd,vs` = `vrsub.vx vd,vs,x0` is provided. + +==== Vector Widening Integer Add/Subtract + +The widening add/subtract instructions are provided in both signed and +unsigned variants, depending on whether the narrower source operands +are first sign- or zero-extended before forming the double-width sum. + +---- +# Widening unsigned integer add/subtract, 2*SEW = SEW +/- SEW +vwaddu.vv vd, vs2, vs1, vm # vector-vector +vwaddu.vx vd, vs2, rs1, vm # vector-scalar +vwsubu.vv vd, vs2, vs1, vm # vector-vector +vwsubu.vx vd, vs2, rs1, vm # vector-scalar + +# Widening signed integer add/subtract, 2*SEW = SEW +/- SEW +vwadd.vv vd, vs2, vs1, vm # vector-vector +vwadd.vx vd, vs2, rs1, vm # vector-scalar +vwsub.vv vd, vs2, vs1, vm # vector-vector +vwsub.vx vd, vs2, rs1, vm # vector-scalar + +# Widening unsigned integer add/subtract, 2*SEW = 2*SEW +/- SEW +vwaddu.wv vd, vs2, vs1, vm # vector-vector +vwaddu.wx vd, vs2, rs1, vm # vector-scalar +vwsubu.wv vd, vs2, vs1, vm # vector-vector +vwsubu.wx vd, vs2, rs1, vm # vector-scalar + +# Widening signed integer add/subtract, 2*SEW = 2*SEW +/- SEW +vwadd.wv vd, vs2, vs1, vm # vector-vector +vwadd.wx vd, vs2, rs1, vm # vector-scalar +vwsub.wv vd, vs2, vs1, vm # vector-vector +vwsub.wx vd, vs2, rs1, vm # vector-scalar +---- + +NOTE: An integer value can be doubled in width using the widening add +instructions with a scalar operand of `x0`. Assembly +pseudoinstructions `vwcvt.x.x.v vd,vs,vm` = `vwadd.vx vd,vs,x0,vm` and +`vwcvtu.x.x.v vd,vs,vm` = `vwaddu.vx vd,vs,x0,vm` are provided. + +==== Vector Integer Extension + +The vector integer extension instructions zero- or sign-extend a +source vector integer operand with EEW less than SEW to fill SEW-sized +elements in the destination. The EEW of the source is 1/2, 1/4, or +1/8 of SEW, while EMUL of the source is (EEW/SEW)*LMUL. The +destination has EEW equal to SEW and EMUL equal to LMUL. + +---- +vzext.vf2 vd, vs2, vm # Zero-extend SEW/2 source to SEW destination +vsext.vf2 vd, vs2, vm # Sign-extend SEW/2 source to SEW destination +vzext.vf4 vd, vs2, vm # Zero-extend SEW/4 source to SEW destination +vsext.vf4 vd, vs2, vm # Sign-extend SEW/4 source to SEW destination +vzext.vf8 vd, vs2, vm # Zero-extend SEW/8 source to SEW destination +vsext.vf8 vd, vs2, vm # Sign-extend SEW/8 source to SEW destination +---- + +If the source EEW is not a supported width, or source EMUL would be +below the minimum legal LMUL, the instruction encoding is reserved. + +NOTE: Standard vector load instructions access memory values that are +the same size as the destination register elements. Some application +code needs to operate on a range of operand widths in a wider element, +for example, loading a byte from memory and adding to an eight-byte +element. To avoid having to provide the cross-product of the number +of vector load instructions by the number of data types (byte, word, +halfword, and also signed/unsigned variants), we instead add explicit +extension instructions that can be used if an appropriate widening +arithmetic instruction is not available. + +==== Vector Integer Add-with-Carry / Subtract-with-Borrow Instructions + +To support multi-word integer arithmetic, instructions that operate on +a carry bit are provided. For each operation (add or subtract), two +instructions are provided: one to provide the result (SEW width), and +the second to generate the carry output (single bit encoded as a mask +boolean). + +The carry inputs and outputs are represented using the mask register +layout as described in Section <>. Due to +encoding constraints, the carry input must come from the implicit `v0` +register, but carry outputs can be written to any vector register that +respects the source/destination overlap restrictions. + +`vadc` and `vsbc` add or subtract the source operands and the carry-in or +borrow-in, and write the result to vector register `vd`. +These instructions are encoded as masked instructions (`vm=0`), but they operate +on and write back all body elements. +Encodings corresponding to the unmasked versions (`vm=1`) are reserved. + +`vmadc` and `vmsbc` add or subtract the source operands, optionally +add the carry-in or subtract the borrow-in if masked (`vm=0`), and +write the result back to mask register `vd`. If unmasked (`vm=1`), +there is no carry-in or borrow-in. These instructions operate on and +write back all body elements, even if masked. Because these +instructions produce a mask value, they always operate with a +tail-agnostic policy. + +---- + # Produce sum with carry. + + # vd[i] = vs2[i] + vs1[i] + v0.mask[i] + vadc.vvm vd, vs2, vs1, v0 # Vector-vector + + # vd[i] = vs2[i] + x[rs1] + v0.mask[i] + vadc.vxm vd, vs2, rs1, v0 # Vector-scalar + + # vd[i] = vs2[i] + imm + v0.mask[i] + vadc.vim vd, vs2, imm, v0 # Vector-immediate + + # Produce carry out in mask register format + + # vd.mask[i] = carry_out(vs2[i] + vs1[i] + v0.mask[i]) + vmadc.vvm vd, vs2, vs1, v0 # Vector-vector + + # vd.mask[i] = carry_out(vs2[i] + x[rs1] + v0.mask[i]) + vmadc.vxm vd, vs2, rs1, v0 # Vector-scalar + + # vd.mask[i] = carry_out(vs2[i] + imm + v0.mask[i]) + vmadc.vim vd, vs2, imm, v0 # Vector-immediate + + # vd.mask[i] = carry_out(vs2[i] + vs1[i]) + vmadc.vv vd, vs2, vs1 # Vector-vector, no carry-in + + # vd.mask[i] = carry_out(vs2[i] + x[rs1]) + vmadc.vx vd, vs2, rs1 # Vector-scalar, no carry-in + + # vd.mask[i] = carry_out(vs2[i] + imm) + vmadc.vi vd, vs2, imm # Vector-immediate, no carry-in +---- + +Because implementing a carry propagation requires executing two +instructions with unchanged inputs, destructive accumulations will +require an additional move to obtain correct results. + +---- + # Example multi-word arithmetic sequence, accumulating into v4 + vmadc.vvm v1, v4, v8, v0 # Get carry into temp register v1 + vadc.vvm v4, v4, v8, v0 # Calc new sum + vmmv.m v0, v1 # Move temp carry into v0 for next word +---- + +The subtract with borrow instruction `vsbc` performs the equivalent +function to support long word arithmetic for subtraction. There are +no subtract with immediate instructions. + +---- + # Produce difference with borrow. + + # vd[i] = vs2[i] - vs1[i] - v0.mask[i] + vsbc.vvm vd, vs2, vs1, v0 # Vector-vector + + # vd[i] = vs2[i] - x[rs1] - v0.mask[i] + vsbc.vxm vd, vs2, rs1, v0 # Vector-scalar + + # Produce borrow out in mask register format + + # vd.mask[i] = borrow_out(vs2[i] - vs1[i] - v0.mask[i]) + vmsbc.vvm vd, vs2, vs1, v0 # Vector-vector + + # vd.mask[i] = borrow_out(vs2[i] - x[rs1] - v0.mask[i]) + vmsbc.vxm vd, vs2, rs1, v0 # Vector-scalar + + # vd.mask[i] = borrow_out(vs2[i] - vs1[i]) + vmsbc.vv vd, vs2, vs1 # Vector-vector, no borrow-in + + # vd.mask[i] = borrow_out(vs2[i] - x[rs1]) + vmsbc.vx vd, vs2, rs1 # Vector-scalar, no borrow-in +---- + +For `vmsbc`, the borrow is defined to be 1 iff the difference, prior to +truncation, is negative. + +For `vadc` and `vsbc`, the instruction encoding is reserved if the +destination vector register is `v0`. + +NOTE: This constraint corresponds to the constraint on masked vector +operations that overwrite the mask register. + +==== Vector Bitwise Logical Instructions + +---- +# Bitwise logical operations. +vand.vv vd, vs2, vs1, vm # Vector-vector +vand.vx vd, vs2, rs1, vm # vector-scalar +vand.vi vd, vs2, imm, vm # vector-immediate + +vor.vv vd, vs2, vs1, vm # Vector-vector +vor.vx vd, vs2, rs1, vm # vector-scalar +vor.vi vd, vs2, imm, vm # vector-immediate + +vxor.vv vd, vs2, vs1, vm # Vector-vector +vxor.vx vd, vs2, rs1, vm # vector-scalar +vxor.vi vd, vs2, imm, vm # vector-immediate +---- + +NOTE: With an immediate of -1, scalar-immediate forms of the `vxor` +instruction provide a bitwise NOT operation. This is provided as +an assembler pseudoinstruction `vnot.v vd,vs,vm` = `vxor.vi vd,vs,-1,vm`. + +==== Vector Single-Width Shift Instructions + +A full set of vector shift instructions are provided, including +logical shift left (`sll`), and logical (zero-extending `srl`) and +arithmetic (sign-extending `sra`) shift right. The data to be shifted +is in the vector register group specified by `vs2` and the shift +amount value can come from a vector register group `vs1`, a scalar +integer register `rs1`, or a zero-extended 5-bit immediate. Only the low +lg2(SEW) bits of the shift-amount value are used to control the shift +amount. + +---- +# Bit shift operations +vsll.vv vd, vs2, vs1, vm # Vector-vector +vsll.vx vd, vs2, rs1, vm # vector-scalar +vsll.vi vd, vs2, uimm, vm # vector-immediate + +vsrl.vv vd, vs2, vs1, vm # Vector-vector +vsrl.vx vd, vs2, rs1, vm # vector-scalar +vsrl.vi vd, vs2, uimm, vm # vector-immediate + +vsra.vv vd, vs2, vs1, vm # Vector-vector +vsra.vx vd, vs2, rs1, vm # vector-scalar +vsra.vi vd, vs2, uimm, vm # vector-immediate +---- + +==== Vector Narrowing Integer Right Shift Instructions + +The narrowing right shifts extract a smaller field from a wider +operand and have both zero-extending (`srl`) and sign-extending +(`sra`) forms. The shift amount can come from a vector register +group, or a scalar `x` register, or a zero-extended 5-bit immediate. +The low lg2(2*SEW) bits of the shift-amount value are +used (e.g., the low 6 bits for a SEW=64-bit to SEW=32-bit narrowing +operation). + +---- + # Narrowing shift right logical, SEW = (2*SEW) >> SEW + vnsrl.wv vd, vs2, vs1, vm # vector-vector + vnsrl.wx vd, vs2, rs1, vm # vector-scalar + vnsrl.wi vd, vs2, uimm, vm # vector-immediate + + # Narrowing shift right arithmetic, SEW = (2*SEW) >> SEW + vnsra.wv vd, vs2, vs1, vm # vector-vector + vnsra.wx vd, vs2, rs1, vm # vector-scalar + vnsra.wi vd, vs2, uimm, vm # vector-immediate +---- + +NOTE: Future extensions might add support for versions that narrow to +a destination that is 1/4 the width of the source. + +NOTE: An integer value can be halved in width using the narrowing integer +shift instructions with a scalar operand of `x0`. An assembly +pseudoinstruction is provided `vncvt.x.x.w vd,vs,vm` = `vnsrl.wx vd,vs,x0,vm`. + +==== Vector Integer Compare Instructions + +The following integer compare instructions write 1 to the destination +mask register element if the comparison evaluates to true, and 0 +otherwise. The destination mask vector is always held in a single +vector register, with a layout of elements as described in Section +<>. The destination mask vector register +may be the same as the source vector mask register (`v0`). + +---- +# Set if equal +vmseq.vv vd, vs2, vs1, vm # Vector-vector +vmseq.vx vd, vs2, rs1, vm # vector-scalar +vmseq.vi vd, vs2, imm, vm # vector-immediate + +# Set if not equal +vmsne.vv vd, vs2, vs1, vm # Vector-vector +vmsne.vx vd, vs2, rs1, vm # vector-scalar +vmsne.vi vd, vs2, imm, vm # vector-immediate + +# Set if less than, unsigned +vmsltu.vv vd, vs2, vs1, vm # Vector-vector +vmsltu.vx vd, vs2, rs1, vm # Vector-scalar + +# Set if less than, signed +vmslt.vv vd, vs2, vs1, vm # Vector-vector +vmslt.vx vd, vs2, rs1, vm # vector-scalar + +# Set if less than or equal, unsigned +vmsleu.vv vd, vs2, vs1, vm # Vector-vector +vmsleu.vx vd, vs2, rs1, vm # vector-scalar +vmsleu.vi vd, vs2, imm, vm # Vector-immediate + +# Set if less than or equal, signed +vmsle.vv vd, vs2, vs1, vm # Vector-vector +vmsle.vx vd, vs2, rs1, vm # vector-scalar +vmsle.vi vd, vs2, imm, vm # vector-immediate + +# Set if greater than, unsigned +vmsgtu.vx vd, vs2, rs1, vm # Vector-scalar +vmsgtu.vi vd, vs2, imm, vm # Vector-immediate + +# Set if greater than, signed +vmsgt.vx vd, vs2, rs1, vm # Vector-scalar +vmsgt.vi vd, vs2, imm, vm # Vector-immediate + +# Following two instructions are not provided directly +# Set if greater than or equal, unsigned +# vmsgeu.vx vd, vs2, rs1, vm # Vector-scalar +# Set if greater than or equal, signed +# vmsge.vx vd, vs2, rs1, vm # Vector-scalar +---- + +The following table indicates how all comparisons are implemented in +native machine code. + +---- +Comparison Assembler Mapping Assembler Pseudoinstruction + +va < vb vmslt{u}.vv vd, va, vb, vm +va <= vb vmsle{u}.vv vd, va, vb, vm +va > vb vmslt{u}.vv vd, vb, va, vm vmsgt{u}.vv vd, va, vb, vm +va >= vb vmsle{u}.vv vd, vb, va, vm vmsge{u}.vv vd, va, vb, vm + +va < x vmslt{u}.vx vd, va, x, vm +va <= x vmsle{u}.vx vd, va, x, vm +va > x vmsgt{u}.vx vd, va, x, vm +va >= x see below + +va < i vmsle{u}.vi vd, va, i-1, vm vmslt{u}.vi vd, va, i, vm +va <= i vmsle{u}.vi vd, va, i, vm +va > i vmsgt{u}.vi vd, va, i, vm +va >= i vmsgt{u}.vi vd, va, i-1, vm vmsge{u}.vi vd, va, i, vm + +va, vb vector register groups +x scalar integer register +i immediate +---- + +NOTE: The immediate forms of `vmslt{u}.vi` are not provided as the +immediate value can be decreased by 1 and the `vmsle{u}.vi` variants +used instead. The `vmsle.vi` range is -16 to 15, resulting in an +effective `vmslt.vi` range of -15 to 16. The `vmsleu.vi` range is 0 +to 15 giving an effective `vmsltu.vi` range of 1 to 16 (Note, +`vmsltu.vi` with immediate 0 is not useful as it is always +false). + +NOTE: Because the 5-bit vector immediates are always sign-extended, +when the high bit of the `simm5` immediate is set, `vmsleu.vi` also +supports unsigned immediate values in the range `2^SEW^-16` to +`2^SEW^-1`, allowing corresponding `vmsltu.vi` compares against +unsigned immediates in the range `2^SEW^-15` to `2^SEW^`. Note that +`vmsltu.vi` with immediate `2^SEW^` is not useful as it is always +true. + +Similarly, `vmsge{u}.vi` is not provided and the compare is +implemented using `vmsgt{u}.vi` with the immediate decremented by one. +The resulting effective `vmsge.vi` range is -15 to 16, and the +resulting effective `vmsgeu.vi` range is 1 to 16 (Note, `vmsgeu.vi` with +immediate 0 is not useful as it is always true). + +NOTE: The `vmsgt` forms for register scalar and immediates are provided +to allow a single compare instruction to provide the correct +polarity of mask value without using additional mask logical +instructions. + +To reduce encoding space, the `vmsge{u}.vx` form is not directly +provided, and so the `va {ge} x` case requires special treatment. + +NOTE: The `vmsge{u}.vx` could potentially be encoded in a +non-orthogonal way under the unused OPIVI variant of `vmslt{u}`. These +would be the only instructions in OPIVI that use a scalar `x`register +however. Alternatively, a further two funct6 encodings could be used, +but these would have a different operand format (writes to mask +register) than others in the same group of 8 funct6 encodings. The +current PoR is to omit these instructions and to synthesize where +needed as described below. + +The `vmsge{u}.vx` operation can be synthesized by reducing the +value of `x` by 1 and using the `vmsgt{u}.vx` instruction, when it is +known that this will not underflow the representation in `x`. + +---- +Sequences to synthesize `vmsge{u}.vx` instruction + +va >= x, x > minimum + + addi t0, x, -1; vmsgt{u}.vx vd, va, t0, vm +---- + +The above sequence will usually be the most efficient implementation, +but assembler pseudoinstructions can be provided for cases where the +range of `x` is unknown. + +---- +unmasked va >= x + + pseudoinstruction: vmsge{u}.vx vd, va, x + expansion: vmslt{u}.vx vd, va, x; vmnand.mm vd, vd, vd + +masked va >= x, vd != v0 + + pseudoinstruction: vmsge{u}.vx vd, va, x, v0.t + expansion: vmslt{u}.vx vd, va, x, v0.t; vmxor.mm vd, vd, v0 + +masked va >= x, vd == v0 + + pseudoinstruction: vmsge{u}.vx vd, va, x, v0.t, vt + expansion: vmslt{u}.vx vt, va, x; vmandn.mm vd, vd, vt + +masked va >= x, any vd + + pseudoinstruction: vmsge{u}.vx vd, va, x, v0.t, vt + expansion: vmslt{u}.vx vt, va, x; vmandn.mm vt, v0, vt; vmandn.mm vd, vd, v0; vmor.mm vd, vt, vd + + The vt argument to the pseudoinstruction must name a temporary vector register that is + not same as vd and which will be clobbered by the pseudoinstruction +---- + +Compares effectively AND in the mask under a mask-undisturbed policy e.g, + +---- + # (a < b) && (b < c) in two instructions when mask-undisturbed + vmslt.vv v0, va, vb # All body elements written + vmslt.vv v0, vb, vc, v0.t # Only update at set mask +---- + +Compares write mask registers, and so always operate under a +tail-agnostic policy. + +==== Vector Integer Min/Max Instructions + +Signed and unsigned integer minimum and maximum instructions are +supported. + +---- +# Unsigned minimum +vminu.vv vd, vs2, vs1, vm # Vector-vector +vminu.vx vd, vs2, rs1, vm # vector-scalar + +# Signed minimum +vmin.vv vd, vs2, vs1, vm # Vector-vector +vmin.vx vd, vs2, rs1, vm # vector-scalar + +# Unsigned maximum +vmaxu.vv vd, vs2, vs1, vm # Vector-vector +vmaxu.vx vd, vs2, rs1, vm # vector-scalar + +# Signed maximum +vmax.vv vd, vs2, vs1, vm # Vector-vector +vmax.vx vd, vs2, rs1, vm # vector-scalar +---- + +==== Vector Single-Width Integer Multiply Instructions + +The single-width multiply instructions perform a SEW-bit*SEW-bit +multiply to generate a 2*SEW-bit product, then return one half of the +product in the SEW-bit-wide destination. The `*mul*` versions write +the low word of the product to the destination register, while the +`*mulh*` versions write the high word of the product to the +destination register. + +---- +# Signed multiply, returning low bits of product +vmul.vv vd, vs2, vs1, vm # Vector-vector +vmul.vx vd, vs2, rs1, vm # vector-scalar + +# Signed multiply, returning high bits of product +vmulh.vv vd, vs2, vs1, vm # Vector-vector +vmulh.vx vd, vs2, rs1, vm # vector-scalar + +# Unsigned multiply, returning high bits of product +vmulhu.vv vd, vs2, vs1, vm # Vector-vector +vmulhu.vx vd, vs2, rs1, vm # vector-scalar + +# Signed(vs2)-Unsigned multiply, returning high bits of product +vmulhsu.vv vd, vs2, vs1, vm # Vector-vector +vmulhsu.vx vd, vs2, rs1, vm # vector-scalar +---- + +NOTE: There is no `vmulhus.vx` opcode to return high half of +unsigned-vector * signed-scalar product. The scalar can be splatted +to a vector, then a `vmulhsu.vv` used. + +NOTE: The current `vmulh*` opcodes perform simple fractional +multiplies, but with no option to scale, round, and/or saturate the +result. A possible future extension can consider variants of `vmulh`, +`vmulhu`, `vmulhsu` that use the `vxrm` rounding mode when discarding +low half of product. There is no possibility of overflow in these +cases. + +==== Vector Integer Divide Instructions + +The divide and remainder instructions are equivalent to the RISC-V +standard scalar integer multiply/divides, with the same results for +extreme inputs. + +---- + # Unsigned divide. + vdivu.vv vd, vs2, vs1, vm # Vector-vector + vdivu.vx vd, vs2, rs1, vm # vector-scalar + + # Signed divide + vdiv.vv vd, vs2, vs1, vm # Vector-vector + vdiv.vx vd, vs2, rs1, vm # vector-scalar + + # Unsigned remainder + vremu.vv vd, vs2, vs1, vm # Vector-vector + vremu.vx vd, vs2, rs1, vm # vector-scalar + + # Signed remainder + vrem.vv vd, vs2, vs1, vm # Vector-vector + vrem.vx vd, vs2, rs1, vm # vector-scalar +---- + +NOTE: The decision to include integer divide and remainder was +contentious. The argument in favor is that without a standard +instruction, software would have to pick some algorithm to perform the +operation, which would likely perform poorly on some +microarchitectures versus others. + +NOTE: There is no instruction to perform a "scalar divide by vector" +operation. + +==== Vector Widening Integer Multiply Instructions + +The widening integer multiply instructions return the full 2*SEW-bit +product from an SEW-bit*SEW-bit multiply. + +---- +# Widening signed-integer multiply +vwmul.vv vd, vs2, vs1, vm # vector-vector +vwmul.vx vd, vs2, rs1, vm # vector-scalar + +# Widening unsigned-integer multiply +vwmulu.vv vd, vs2, vs1, vm # vector-vector +vwmulu.vx vd, vs2, rs1, vm # vector-scalar + +# Widening signed(vs2)-unsigned integer multiply +vwmulsu.vv vd, vs2, vs1, vm # vector-vector +vwmulsu.vx vd, vs2, rs1, vm # vector-scalar +---- + +==== Vector Single-Width Integer Multiply-Add Instructions + +The integer multiply-add instructions are destructive and are provided +in two forms, one that overwrites the addend or minuend +(`vmacc`, `vnmsac`) and one that overwrites the first multiplicand +(`vmadd`, `vnmsub`). + +The low half of the product is added or subtracted from the third operand. + +NOTE: `sac` is intended to be read as "subtract from accumulator". The +opcode is `vnmsac` to match the (unfortunately counterintuitive) +floating-point `fnmsub` instruction definition. Similarly for the +`vnmsub` opcode. + +---- +# Integer multiply-add, overwrite addend +vmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] +vmacc.vx vd, rs1, vs2, vm # vd[i] = +(x[rs1] * vs2[i]) + vd[i] + +# Integer multiply-sub, overwrite minuend +vnmsac.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) + vd[i] +vnmsac.vx vd, rs1, vs2, vm # vd[i] = -(x[rs1] * vs2[i]) + vd[i] + +# Integer multiply-add, overwrite multiplicand +vmadd.vv vd, vs1, vs2, vm # vd[i] = (vs1[i] * vd[i]) + vs2[i] +vmadd.vx vd, rs1, vs2, vm # vd[i] = (x[rs1] * vd[i]) + vs2[i] + +# Integer multiply-sub, overwrite multiplicand +vnmsub.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vd[i]) + vs2[i] +vnmsub.vx vd, rs1, vs2, vm # vd[i] = -(x[rs1] * vd[i]) + vs2[i] +---- + +==== Vector Widening Integer Multiply-Add Instructions + +The widening integer multiply-add instructions add the full 2*SEW-bit +product from a SEW-bit*SEW-bit multiply to a 2*SEW-bit value and +produce a 2*SEW-bit result. All combinations of signed and unsigned +multiply operands are supported. + +---- +# Widening unsigned-integer multiply-add, overwrite addend +vwmaccu.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] +vwmaccu.vx vd, rs1, vs2, vm # vd[i] = +(x[rs1] * vs2[i]) + vd[i] + +# Widening signed-integer multiply-add, overwrite addend +vwmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] +vwmacc.vx vd, rs1, vs2, vm # vd[i] = +(x[rs1] * vs2[i]) + vd[i] + +# Widening signed-unsigned-integer multiply-add, overwrite addend +vwmaccsu.vv vd, vs1, vs2, vm # vd[i] = +(signed(vs1[i]) * unsigned(vs2[i])) + vd[i] +vwmaccsu.vx vd, rs1, vs2, vm # vd[i] = +(signed(x[rs1]) * unsigned(vs2[i])) + vd[i] + +# Widening unsigned-signed-integer multiply-add, overwrite addend +vwmaccus.vx vd, rs1, vs2, vm # vd[i] = +(unsigned(x[rs1]) * signed(vs2[i])) + vd[i] +---- + +==== Vector Integer Merge Instructions + +The vector integer merge instructions combine two source operands +based on a mask. Unlike regular arithmetic instructions, the +merge operates on all body elements (i.e., the set of elements from +`vstart` up to the current vector length in `vl`). + +The `vmerge` instructions are encoded as masked instructions (`vm=0`). +The instructions combine two +sources as follows. At elements where the mask value is zero, the +first operand is copied to the destination element, otherwise the +second operand is copied to the destination element. The first +operand is always a vector register group specified by `vs2`. The +second operand is a vector register group specified by `vs1` or a +scalar `x` register specified by `rs1` or a 5-bit sign-extended +immediate. + +---- +vmerge.vvm vd, vs2, vs1, v0 # vd[i] = v0.mask[i] ? vs1[i] : vs2[i] +vmerge.vxm vd, vs2, rs1, v0 # vd[i] = v0.mask[i] ? x[rs1] : vs2[i] +vmerge.vim vd, vs2, imm, v0 # vd[i] = v0.mask[i] ? imm : vs2[i] +---- + +==== Vector Integer Move Instructions + +The vector integer move instructions copy a source operand to a vector +register group. +The `vmv.v.v` variant copies a vector register group, whereas the `vmv.v.x` +and `vmv.v.i` variants __splat__ a scalar register or immediate to all active +elements of the destination vector register group. +These instructions are encoded as unmasked instructions (`vm=1`). +The first operand specifier (`vs2`) must contain `v0`, and any other vector +register number in `vs2` is _reserved_. + +---- +vmv.v.v vd, vs1 # vd[i] = vs1[i] +vmv.v.x vd, rs1 # vd[i] = x[rs1] +vmv.v.i vd, imm # vd[i] = imm +---- + +NOTE: Mask values can be widened into SEW-width elements using a +sequence `vmv.v.i vd, 0; vmerge.vim vd, vd, 1, v0`. + +NOTE: The vector integer move instructions share the encoding with the vector +merge instructions, but with `vm=1` and `vs2=v0`. + +The form `vmv.v.v vd, vd`, which leaves body elements unchanged, +can be used to indicate that the register will next be used +with an EEW equal to SEW. + +NOTE: Implementations that internally reorganize data according to EEW +can shuffle the internal representation according to SEW. +Implementations that do not internally reorganize data can dynamically +elide this instruction, and treat as a NOP. + +NOTE: The `vmv.v.v vd. vd` instruction is not a RISC-V HINT as a +tail-agnostic setting may cause an architectural state change on some +implementations. + +[[sec-vector-fixed-point]] +=== Vector Fixed-Point Arithmetic Instructions + +The preceding set of integer arithmetic instructions is extended to support +fixed-point arithmetic. + +A fixed-point number is a two's-complement signed or unsigned integer +interpreted as the numerator in a fraction with an implicit denominator. +The fixed-point instructions are intended to be applied to the numerators; +it is the responsibility of software to manage the denominators. +An N-bit element can hold two's-complement signed integers in the +range -2^N-1^...+2^N-1^-1, and unsigned integers in the range 0 +... +2^N^-1. The fixed-point instructions help preserve precision in +narrow operands by supporting scaling and rounding, and can handle +overflow by saturating results into the destination format range. + +NOTE: The widening integer operations described above can also be used +to avoid overflow. + +==== Vector Single-Width Saturating Add and Subtract + +Saturating forms of integer add and subtract are provided, for both +signed and unsigned integers. If the result would overflow the +destination, the result is replaced with the closest representable +value, and the `vxsat` bit is set. + +---- +# Saturating adds of unsigned integers. +vsaddu.vv vd, vs2, vs1, vm # Vector-vector +vsaddu.vx vd, vs2, rs1, vm # vector-scalar +vsaddu.vi vd, vs2, imm, vm # vector-immediate + +# Saturating adds of signed integers. +vsadd.vv vd, vs2, vs1, vm # Vector-vector +vsadd.vx vd, vs2, rs1, vm # vector-scalar +vsadd.vi vd, vs2, imm, vm # vector-immediate + +# Saturating subtract of unsigned integers. +vssubu.vv vd, vs2, vs1, vm # Vector-vector +vssubu.vx vd, vs2, rs1, vm # vector-scalar + +# Saturating subtract of signed integers. +vssub.vv vd, vs2, vs1, vm # Vector-vector +vssub.vx vd, vs2, rs1, vm # vector-scalar +---- + +==== Vector Single-Width Averaging Add and Subtract + +The averaging add and subtract instructions right shift the result by +one bit and round off the result according to the setting in `vxrm`. +Both unsigned and signed versions are provided. +For `vaaddu` and `vaadd` there can be no overflow in the result. +For `vasub` and `vasubu`, overflow is ignored and the result wraps around. + +NOTE: For `vasub`, overflow occurs only when subtracting the smallest number +from the largest number under `rnu` or `rne` rounding. + +---- +# Averaging add + +# Averaging adds of unsigned integers. +vaaddu.vv vd, vs2, vs1, vm # roundoff_unsigned(vs2[i] + vs1[i], 1) +vaaddu.vx vd, vs2, rs1, vm # roundoff_unsigned(vs2[i] + x[rs1], 1) + +# Averaging adds of signed integers. +vaadd.vv vd, vs2, vs1, vm # roundoff_signed(vs2[i] + vs1[i], 1) +vaadd.vx vd, vs2, rs1, vm # roundoff_signed(vs2[i] + x[rs1], 1) + +# Averaging subtract + +# Averaging subtract of unsigned integers. +vasubu.vv vd, vs2, vs1, vm # roundoff_unsigned(vs2[i] - vs1[i], 1) +vasubu.vx vd, vs2, rs1, vm # roundoff_unsigned(vs2[i] - x[rs1], 1) + +# Averaging subtract of signed integers. +vasub.vv vd, vs2, vs1, vm # roundoff_signed(vs2[i] - vs1[i], 1) +vasub.vx vd, vs2, rs1, vm # roundoff_signed(vs2[i] - x[rs1], 1) +---- + +==== Vector Single-Width Fractional Multiply with Rounding and Saturation + +The signed fractional multiply instruction produces a 2*SEW product of +the two SEW inputs, then shifts the result right by SEW-1 bits, +rounding these bits according to `vxrm`, then saturates the result to +fit into SEW bits. If the result causes saturation, the `vxsat` bit +is set. + +---- +# Signed saturating and rounding fractional multiply +# See vxrm description for rounding calculation +vsmul.vv vd, vs2, vs1, vm # vd[i] = clip(roundoff_signed(vs2[i]*vs1[i], SEW-1)) +vsmul.vx vd, vs2, rs1, vm # vd[i] = clip(roundoff_signed(vs2[i]*x[rs1], SEW-1)) +---- + +NOTE: When multiplying two N-bit signed numbers, the largest magnitude +is obtained for -2^N-1^ * -2^N-1^ producing a result +2^2N-2^, which +has a single (zero) sign bit when held in 2N bits. All other products +have two sign bits in 2N bits. To retain greater precision in N +result bits, the product is shifted right by one bit less than N, +saturating the largest magnitude result but increasing result +precision by one bit for all other products. + +NOTE: We do not provide an equivalent fractional multiply where one +input is unsigned, as these would retain all upper SEW bits and would +not need to saturate. This operation is partly covered by the +`vmulhu` and `vmulhsu` instructions, for the case where rounding is +simply truncation (`rdn`). + +==== Vector Single-Width Scaling Shift Instructions + +These instructions shift the input value right, and round off the +shifted out bits according to `vxrm`. The scaling right shifts have +both zero-extending (`vssrl`) and sign-extending (`vssra`) forms. The +data to be shifted is in the vector register group specified by `vs2` +and the shift amount value can come from a vector register group +`vs1`, a scalar integer register `rs1`, or a zero-extended 5-bit +immediate. Only the low lg2(SEW) bits of the shift-amount value are +used to control the shift amount. + +---- + # Scaling shift right logical + vssrl.vv vd, vs2, vs1, vm # vd[i] = roundoff_unsigned(vs2[i], vs1[i]) + vssrl.vx vd, vs2, rs1, vm # vd[i] = roundoff_unsigned(vs2[i], x[rs1]) + vssrl.vi vd, vs2, uimm, vm # vd[i] = roundoff_unsigned(vs2[i], uimm) + + # Scaling shift right arithmetic + vssra.vv vd, vs2, vs1, vm # vd[i] = roundoff_signed(vs2[i],vs1[i]) + vssra.vx vd, vs2, rs1, vm # vd[i] = roundoff_signed(vs2[i], x[rs1]) + vssra.vi vd, vs2, uimm, vm # vd[i] = roundoff_signed(vs2[i], uimm) +---- + +==== Vector Narrowing Fixed-Point Clip Instructions + +The `vnclip` instructions are used to pack a fixed-point value into a +narrower destination. The instructions support rounding, scaling, and +saturation into the final destination format. The source data is in +the vector register group specified by `vs2`. The scaling shift amount +value can come from a vector register group `vs1`, a scalar integer +register `rs1`, or a zero-extended 5-bit immediate. The low +lg2(2*SEW) bits of the vector or scalar shift-amount value (e.g., the +low 6 bits for a SEW=64-bit to SEW=32-bit narrowing operation) are +used to control the right shift amount, which provides the scaling. +---- +# Narrowing unsigned clip +# SEW 2*SEW SEW + vnclipu.wv vd, vs2, vs1, vm # vd[i] = clip(roundoff_unsigned(vs2[i], vs1[i])) + vnclipu.wx vd, vs2, rs1, vm # vd[i] = clip(roundoff_unsigned(vs2[i], x[rs1])) + vnclipu.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff_unsigned(vs2[i], uimm)) + +# Narrowing signed clip + vnclip.wv vd, vs2, vs1, vm # vd[i] = clip(roundoff_signed(vs2[i], vs1[i])) + vnclip.wx vd, vs2, rs1, vm # vd[i] = clip(roundoff_signed(vs2[i], x[rs1])) + vnclip.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff_signed(vs2[i], uimm)) +---- + +For `vnclipu`/`vnclip`, the rounding mode is specified in the `vxrm` +CSR. Rounding occurs around the least-significant bit of the +destination and before saturation. + +For `vnclipu`, the shifted rounded source value is treated as an +unsigned integer and saturates if the result would overflow the +destination viewed as an unsigned integer. + +NOTE: There is no single instruction that can saturate a signed value +into an unsigned destination. A sequence of two vector instructions +that first removes negative numbers by performing a max against 0 +using `vmax` then clips the resulting unsigned value into the +destination using `vnclipu` can be used if setting `vxsat` value for +negative numbers is not required. A `vsetvli` is required inbetween +these two instructions to change SEW. + +For `vnclip`, the shifted rounded source value is treated as a signed +integer and saturates if the result would overflow the destination viewed +as a signed integer. + +If any destination element is saturated, the `vxsat` bit is set in the +`vxsat` register. + +[[sec-vector-float]] +=== Vector Floating-Point Instructions + +The standard vector floating-point instructions treat elements as +IEEE-754/2008-compatible values. If the EEW of a vector +floating-point operand does not correspond to a supported IEEE +floating-point type, the instruction encoding is reserved. + +NOTE: Whether floating-point is supported, and for which element +widths, is determined by the specific vector extension. The current +set of extensions include support for 32-bit and 64-bit floating-point +values. When 16-bit and 128-bit element widths are added, they will be +also be treated as IEEE-754/2008-compatible values. Other +floating-point formats may be supported in future extensions. + +Vector floating-point instructions require the presence of base scalar +floating-point extensions corresponding to the supported vector +floating-point element widths. + +NOTE: In particular, future vector extensions supporting 16-bit +half-precision floating-point values will also require some scalar +half-precision floating-point support. + +If the floating-point unit status field `mstatus.FS` is `Off` then any +attempt to execute a vector floating-point instruction will raise an +illegal instruction exception. Any vector floating-point instruction +that modifies any floating-point extension state (i.e., floating-point +CSRs or `f` registers) must set `mstatus.FS` to `Dirty`. + +If the hypervisor extension is implemented and V=1, the `vsstatus.FS` field is +additionally in effect for vector floating-point instructions. If +`vsstatus.FS` or `mstatus.FS` is `Off` then any +attempt to execute a vector floating-point instruction will raise an +illegal instruction exception. Any vector floating-point instruction +that modifies any floating-point extension state (i.e., floating-point +CSRs or `f` registers) must set both `mstatus.FS` and `vsstatus.FS` to `Dirty`. + +The vector floating-point instructions have the same behavior as the +scalar floating-point instructions with regard to NaNs. + +Scalar values for floating-point vector-scalar operations are sourced +as described in Section <>. + +==== Vector Floating-Point Exception Flags + +A vector floating-point exception at any active floating-point element +sets the standard FP exception flags in the `fflags` register. Inactive +elements do not set FP exception flags. + +==== Vector Single-Width Floating-Point Add/Subtract Instructions + +---- + # Floating-point add + vfadd.vv vd, vs2, vs1, vm # Vector-vector + vfadd.vf vd, vs2, rs1, vm # vector-scalar + + # Floating-point subtract + vfsub.vv vd, vs2, vs1, vm # Vector-vector + vfsub.vf vd, vs2, rs1, vm # Vector-scalar vd[i] = vs2[i] - f[rs1] + vfrsub.vf vd, vs2, rs1, vm # Scalar-vector vd[i] = f[rs1] - vs2[i] +---- + +==== Vector Widening Floating-Point Add/Subtract Instructions + +---- +# Widening FP add/subtract, 2*SEW = SEW +/- SEW +vfwadd.vv vd, vs2, vs1, vm # vector-vector +vfwadd.vf vd, vs2, rs1, vm # vector-scalar +vfwsub.vv vd, vs2, vs1, vm # vector-vector +vfwsub.vf vd, vs2, rs1, vm # vector-scalar + +# Widening FP add/subtract, 2*SEW = 2*SEW +/- SEW +vfwadd.wv vd, vs2, vs1, vm # vector-vector +vfwadd.wf vd, vs2, rs1, vm # vector-scalar +vfwsub.wv vd, vs2, vs1, vm # vector-vector +vfwsub.wf vd, vs2, rs1, vm # vector-scalar +---- + +==== Vector Single-Width Floating-Point Multiply/Divide Instructions + +---- + # Floating-point multiply + vfmul.vv vd, vs2, vs1, vm # Vector-vector + vfmul.vf vd, vs2, rs1, vm # vector-scalar + + # Floating-point divide + vfdiv.vv vd, vs2, vs1, vm # Vector-vector + vfdiv.vf vd, vs2, rs1, vm # vector-scalar + + # Reverse floating-point divide vector = scalar / vector + vfrdiv.vf vd, vs2, rs1, vm # scalar-vector, vd[i] = f[rs1]/vs2[i] +---- + +==== Vector Widening Floating-Point Multiply + +---- +# Widening floating-point multiply +vfwmul.vv vd, vs2, vs1, vm # vector-vector +vfwmul.vf vd, vs2, rs1, vm # vector-scalar +---- + +==== Vector Single-Width Floating-Point Fused Multiply-Add Instructions + +All four varieties of fused multiply-add are provided, and in two +destructive forms that overwrite one of the operands, either the +addend or the first multiplicand. + +---- +# FP multiply-accumulate, overwrites addend +vfmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] +vfmacc.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) + vd[i] + +# FP negate-(multiply-accumulate), overwrites subtrahend +vfnmacc.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) - vd[i] +vfnmacc.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) - vd[i] + +# FP multiply-subtract-accumulator, overwrites subtrahend +vfmsac.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) - vd[i] +vfmsac.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) - vd[i] + +# FP negate-(multiply-subtract-accumulator), overwrites minuend +vfnmsac.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) + vd[i] +vfnmsac.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) + vd[i] + +# FP multiply-add, overwrites multiplicand +vfmadd.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vd[i]) + vs2[i] +vfmadd.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vd[i]) + vs2[i] + +# FP negate-(multiply-add), overwrites multiplicand +vfnmadd.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vd[i]) - vs2[i] +vfnmadd.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vd[i]) - vs2[i] + +# FP multiply-sub, overwrites multiplicand +vfmsub.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vd[i]) - vs2[i] +vfmsub.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vd[i]) - vs2[i] + +# FP negate-(multiply-sub), overwrites multiplicand +vfnmsub.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vd[i]) + vs2[i] +vfnmsub.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vd[i]) + vs2[i] +---- + +NOTE: While we considered using the two unused rounding modes +in the scalar FP FMA encoding to provide a few non-destructive FMAs, +these would complicate microarchitectures by being the only maskable +operation with three inputs and separate output. + +==== Vector Widening Floating-Point Fused Multiply-Add Instructions + +The widening floating-point fused multiply-add instructions all +overwrite the wide addend with the result. The multiplier inputs are +all SEW wide, while the addend and destination is 2*SEW bits wide. + +---- +# FP widening multiply-accumulate, overwrites addend +vfwmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] +vfwmacc.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) + vd[i] + +# FP widening negate-(multiply-accumulate), overwrites addend +vfwnmacc.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) - vd[i] +vfwnmacc.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) - vd[i] + +# FP widening multiply-subtract-accumulator, overwrites addend +vfwmsac.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) - vd[i] +vfwmsac.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) - vd[i] + +# FP widening negate-(multiply-subtract-accumulator), overwrites addend +vfwnmsac.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) + vd[i] +vfwnmsac.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) + vd[i] +---- + +==== Vector Floating-Point Square-Root Instruction + +This is a unary vector-vector instruction. + +---- + # Floating-point square root + vfsqrt.v vd, vs2, vm # Vector-vector square root +---- + +==== Vector Floating-Point Reciprocal Square-Root Estimate Instruction + +---- + # Floating-point reciprocal square-root estimate to 7 bits. + vfrsqrt7.v vd, vs2, vm +---- + +This is a unary vector-vector instruction that returns an estimate of +1/sqrt(x) accurate to 7 bits. + +NOTE: An earlier draft version had used the assembler name `vfrsqrte7` +but this was deemed to cause confusion with the ``e``__x__ notation for element +width. The earlier name can be retained as alias in tool chains for +backward compatibility. + +The following table describes the instruction's behavior for all +classes of floating-point inputs: + +[cols="1,1,1"] +[%autowidth] +|=== +| Input | Output | Exceptions raised + +| -{inf} {le} _x_ < -0.0 | canonical NaN | NV +| -0.0 | -{inf} | DZ +| +0.0 | +{inf} | DZ +| +0.0 < _x_ < +{inf} | _estimate of 1/sqrt(x)_ | +| +{inf} | +0.0 | +| qNaN | canonical NaN | +| sNaN | canonical NaN | NV +|=== + +NOTE: All positive normal and subnormal inputs produce normal outputs. + +NOTE: The output value is independent of the dynamic rounding mode. + +For the non-exceptional cases, the low bit of the exponent and the six high +bits of significand (after the leading one) are concatenated and used to +address the following table. +The output of the table becomes the seven high bits of the result significand +(after the leading one); the remainder of the result significand is zero. +Subnormal inputs are normalized and the exponent adjusted appropriately before +the lookup. +The output exponent is chosen to make the result approximate the reciprocal of +the square root of the argument. + +More precisely, the result is computed as follows. +Let the normalized input exponent be equal to the input exponent if the input +is normal, or 0 minus the number of leading zeros in the significand +otherwise. +If the input is subnormal, the normalized input significand is given by +shifting the input significand left by 1 minus the normalized input exponent, +discarding the leading 1 bit. +The output exponent equals floor((3*B - 1 - the normalized input exponent) / 2), +where B is the exponent bias. The output sign equals the input sign. + +The following table gives the seven MSBs of the output significand as a +function of the LSB of the normalized input exponent and the six MSBs of the +normalized input significand; the other bits of the output significand are zero. + +include::vfrsqrt7.adoc[] + +NOTE: For example, when SEW=32, vfrsqrt7(0x00718abc ({approx} 1.043e-38)) = 0x5f080000 ({approx} 9.800e18), and vfrsqrt7(0x7f765432 ({approx} 3.274e38)) = 0x1f820000 ({approx} 5.506e-20). + +NOTE: The 7 bit accuracy was chosen as it requires 0,1,2,3 +Newton-Raphson iterations to converge to close to bfloat16, FP16, +FP32, FP64 accuracy respectively. Future instructions can be defined +with greater estimate accuracy. + +==== Vector Floating-Point Reciprocal Estimate Instruction + +---- + # Floating-point reciprocal estimate to 7 bits. + vfrec7.v vd, vs2, vm +---- + +NOTE: An earlier draft version had used the assembler name `vfrece7` +but this was deemed to cause confusion with ``e``__x__ notation for element +width. The earlier name can be retained as alias in tool chains for +backward compatibility. + +This is a unary vector-vector instruction that returns an estimate of +1/x accurate to 7 bits. + +The following table describes the instruction's behavior for all +classes of floating-point inputs, where _B_ is the exponent bias: + +[cols="1,1,1,1"] +[%autowidth] +|=== +| Input (_x_) | Rounding Mode | Output (_y_ {approx} _1/x_) | Exceptions raised + +| -{inf} | _any_ | -0.0 | +| -2^B+1^ < _x_ {le} -2^B^ (normal) | _any_ | -2^-(B+1)^ {ge} _y_ > -2^-B^ (subnormal, sig=01...) | +| -2^B^ < _x_ {le} -2^B-1^ (normal) | _any_ | -2^-B^ {ge} _y_ > -2^-B+1^ (subnormal, sig=1...) | +| -2^B-1^ < _x_ {le} -2^-B+1^ (normal) | _any_ | -2^-B+1^ {ge} _y_ > -2^B-1^ (normal) | +| -2^-B+1^ < _x_ {le} -2^-B^ (subnormal, sig=1...) | _any_ | -2^B-1^ {ge} _y_ > -2^B^ (normal) | +| -2^-B^ < _x_ {le} -2^-(B+1)^ (subnormal, sig=01...) | _any_ | -2^B^ {ge} _y_ > -2^B+1^ (normal) | +| -2^-(B+1)^ < _x_ < -0.0 (subnormal, sig=00...) | RUP, RTZ | greatest-mag. negative finite value | NX, OF +| -2^-(B+1)^ < _x_ < -0.0 (subnormal, sig=00...) | RDN, RNE, RMM | -{inf} | NX, OF +| -0.0 | _any_ | -{inf} | DZ +| +0.0 | _any_ | +{inf} | DZ +| +0.0 < _x_ < 2^-(B+1)^ (subnormal, sig=00...) | RUP, RNE, RMM | +{inf} | NX, OF +| +0.0 < _x_ < 2^-(B+1)^ (subnormal, sig=00...) | RDN, RTZ | greatest finite value | NX, OF +| 2^-(B+1)^ {le} _x_ < 2^-B^ (subnormal, sig=01...) | _any_ | 2^B+1^ > _y_ {ge} 2^B^ (normal) | +| 2^-B^ {le} _x_ < 2^-B+1^ (subnormal, sig=1...) | _any_ | 2^B^ > _y_ {ge} 2^B-1^ (normal) | +| 2^-B+1^ {le} _x_ < 2^B-1^ (normal) | _any_ | 2^B-1^ > _y_ {ge} 2^-B+1^ (normal) | +| 2^B-1^ {le} _x_ < 2^B^ (normal) | _any_ | 2^-B+1^ > _y_ {ge} 2^-B^ (subnormal, sig=1...) | +| 2^B^ {le} _x_ < 2^B+1^ (normal) | _any_ | 2^-B^ > _y_ {ge} 2^-(B+1)^ (subnormal, sig=01...) | +| +{inf} | _any_ | +0.0 | +| qNaN | _any_ | canonical NaN | +| sNaN | _any_ | canonical NaN | NV +|=== + +NOTE: Subnormal inputs with magnitude at least 2^-(B+1)^ produce normal outputs; +other subnormal inputs produce infinite outputs. +Normal inputs with magnitude at least 2^B-1^ produce subnormal outputs; +other normal inputs produce normal outputs. + +NOTE: The output value depends on the dynamic rounding mode when +the overflow exception is raised. + +For the non-exceptional cases, the seven high bits of significand (after the +leading one) are used to address the following table. +The output of the table becomes the seven high bits of the result significand +(after the leading one); the remainder of the result significand is zero. +Subnormal inputs are normalized and the exponent adjusted appropriately before +the lookup. +The output exponent is chosen to make the result approximate the reciprocal of +the argument, and subnormal outputs are denormalized accordingly. + +More precisely, the result is computed as follows. +Let the normalized input exponent be equal to the input exponent if the input +is normal, or 0 minus the number of leading zeros in the significand +otherwise. +The normalized output exponent equals (2*B - 1 - the normalized input exponent). +If the normalized output exponent is outside the range [-1, 2*B], the result +corresponds to one of the exceptional cases in the table above. + +If the input is subnormal, the normalized input significand is given by +shifting the input significand left by 1 minus the normalized input exponent, +discarding the leading 1 bit. +Otherwise, the normalized input significand equals the input significand. +The following table gives the seven MSBs of the normalized output significand +as a function of the seven MSBs of the normalized input significand; the other +bits of the normalized output significand are zero. + +include::vfrec7.adoc[] + +If the normalized output exponent is 0 or -1, the result is subnormal: the +output exponent is 0, and the output significand is given by concatenating +a 1 bit to the left of the normalized output significand, then shifting that +quantity right by 1 minus the normalized output exponent. +Otherwise, the output exponent equals the normalized output exponent, and the +output significand equals the normalized output significand. +The output sign equals the input sign. + +NOTE: For example, when SEW=32, vfrec7(0x00718abc ({approx} 1.043e-38)) = 0x7e900000 ({approx} 9.570e37), and vfrec7(0x7f765432 ({approx} 3.274e38)) = 0x00214000 ({approx} 3.053e-39). + +NOTE: The 7 bit accuracy was chosen as it requires 0,1,2,3 +Newton-Raphson iterations to converge to close to bfloat16, FP16, +FP32, FP64 accuracy respectively. Future instructions can be defined +with greater estimate accuracy. + +==== Vector Floating-Point MIN/MAX Instructions + +The vector floating-point `vfmin` and `vfmax` instructions have the +same behavior as the corresponding scalar floating-point instructions +in version 2.2 of the RISC-V F/D/Q extension. + +---- + # Floating-point minimum + vfmin.vv vd, vs2, vs1, vm # Vector-vector + vfmin.vf vd, vs2, rs1, vm # vector-scalar + + # Floating-point maximum + vfmax.vv vd, vs2, vs1, vm # Vector-vector + vfmax.vf vd, vs2, rs1, vm # vector-scalar +---- + +==== Vector Floating-Point Sign-Injection Instructions + +Vector versions of the scalar sign-injection instructions. The result +takes all bits except the sign bit from the vector `vs2` operands. + +---- + vfsgnj.vv vd, vs2, vs1, vm # Vector-vector + vfsgnj.vf vd, vs2, rs1, vm # vector-scalar + + vfsgnjn.vv vd, vs2, vs1, vm # Vector-vector + vfsgnjn.vf vd, vs2, rs1, vm # vector-scalar + + vfsgnjx.vv vd, vs2, vs1, vm # Vector-vector + vfsgnjx.vf vd, vs2, rs1, vm # vector-scalar +---- + +NOTE: A vector of floating-point values can be negated using a +sign-injection instruction with both source operands set to the same +vector operand. An assembly pseudoinstruction is provided: `vfneg.v vd,vs` = `vfsgnjn.vv vd,vs,vs`. + +NOTE: The absolute value of a vector of floating-point elements can be +calculated using a sign-injection instruction with both source +operands set to the same vector operand. An assembly +pseudoinstruction is provided: `vfabs.v vd,vs` = `vfsgnjx.vv vd,vs,vs`. + +==== Vector Floating-Point Compare Instructions + +These vector FP compare instructions compare two source operands and +write the comparison result to a mask register. The destination mask +vector is always held in a single vector register, with a layout of +elements as described in Section <>. The +destination mask vector register may be the same as the source vector +mask register (`v0`). Compares write mask registers, and so always +operate under a tail-agnostic policy. + +The compare instructions follow the semantics of the scalar +floating-point compare instructions. `vmfeq` and `vmfne` raise the invalid +operation exception only on signaling NaN inputs. `vmflt`, `vmfle`, `vmfgt`, +and `vmfge` raise the invalid operation exception on both signaling and +quiet NaN inputs. +`vmfne` writes 1 to the destination element when either +operand is NaN, whereas the other compares write 0 when either operand +is NaN. + +---- + # Compare equal + vmfeq.vv vd, vs2, vs1, vm # Vector-vector + vmfeq.vf vd, vs2, rs1, vm # vector-scalar + + # Compare not equal + vmfne.vv vd, vs2, vs1, vm # Vector-vector + vmfne.vf vd, vs2, rs1, vm # vector-scalar + + # Compare less than + vmflt.vv vd, vs2, vs1, vm # Vector-vector + vmflt.vf vd, vs2, rs1, vm # vector-scalar + + # Compare less than or equal + vmfle.vv vd, vs2, vs1, vm # Vector-vector + vmfle.vf vd, vs2, rs1, vm # vector-scalar + + # Compare greater than + vmfgt.vf vd, vs2, rs1, vm # vector-scalar + + # Compare greater than or equal + vmfge.vf vd, vs2, rs1, vm # vector-scalar +---- + +---- +Comparison Assembler Mapping Assembler pseudoinstruction + +va < vb vmflt.vv vd, va, vb, vm +va <= vb vmfle.vv vd, va, vb, vm +va > vb vmflt.vv vd, vb, va, vm vmfgt.vv vd, va, vb, vm +va >= vb vmfle.vv vd, vb, va, vm vmfge.vv vd, va, vb, vm + +va < f vmflt.vf vd, va, f, vm +va <= f vmfle.vf vd, va, f, vm +va > f vmfgt.vf vd, va, f, vm +va >= f vmfge.vf vd, va, f, vm + +va, vb vector register groups +f scalar floating-point register +---- + +NOTE: Providing all forms is necessary to correctly handle unordered +compares for NaNs. + +NOTE: C99 floating-point quiet compares can be implemented by masking +the signaling compares when either input is NaN, as follows. When +the comparand is a non-NaN constant, the middle two instructions can be +omitted. + +---- + # Example of implementing isgreater() + vmfeq.vv v0, va, va # Only set where A is not NaN. + vmfeq.vv v1, vb, vb # Only set where B is not NaN. + vmand.mm v0, v0, v1 # Only set where A and B are ordered, + vmfgt.vv v0, va, vb, v0.t # so only set flags on ordered values. +---- + +NOTE: In the above sequence, it is tempting to mask the second `vmfeq` +instruction and remove the `vmand` instruction, but this more efficient +sequence incorrectly fails to raise the invalid exception when an +element of `va` contains a quiet NaN and the corresponding element in +`vb` contains a signaling NaN. + +==== Vector Floating-Point Classify Instruction + +This is a unary vector-vector instruction that operates in the same +way as the scalar classify instruction. + +---- + vfclass.v vd, vs2, vm # Vector-vector +---- + +The 10-bit mask produced by this instruction is placed in the +least-significant bits of the result elements. The upper (SEW-10) +bits of the result are filled with zeros. The instruction is only +defined for SEW=16b and above, so the result will always fit in the +destination elements. + +==== Vector Floating-Point Merge Instruction + +A vector-scalar floating-point merge instruction is provided, which +operates on all body elements from `vstart` up to the current vector +length in `vl` regardless of mask value. + +The `vfmerge.vfm` instruction is encoded as a masked instruction (`vm=0`). +At elements where the mask value is zero, the first vector operand is +copied to the destination element, otherwise a scalar floating-point +register value is copied to the destination element. + +---- +vfmerge.vfm vd, vs2, rs1, v0 # vd[i] = v0.mask[i] ? f[rs1] : vs2[i] +---- + +[[sec-vector-float-move]] +==== Vector Floating-Point Move Instruction + +The vector floating-point move instruction __splats__ a floating-point +scalar operand to a vector register group. The instruction copies a +scalar `f` register value to all active elements of a vector register +group. This instruction is encoded as an unmasked instruction (`vm=1`). +The instruction must have the `vs2` field set to `v0`, with all other +values for `vs2` reserved. + +---- +vfmv.v.f vd, rs1 # vd[i] = f[rs1] +---- + +NOTE: The `vfmv.v.f` instruction shares the encoding with the `vfmerge.vfm` +instruction, but with `vm=1` and `vs2=v0`. + +==== Single-Width Floating-Point/Integer Type-Convert Instructions + +Conversion operations are provided to convert to and from +floating-point values and unsigned and signed integers, where both +source and destination are SEW wide. + +---- +vfcvt.xu.f.v vd, vs2, vm # Convert float to unsigned integer. +vfcvt.x.f.v vd, vs2, vm # Convert float to signed integer. + +vfcvt.rtz.xu.f.v vd, vs2, vm # Convert float to unsigned integer, truncating. +vfcvt.rtz.x.f.v vd, vs2, vm # Convert float to signed integer, truncating. + +vfcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to float. +vfcvt.f.x.v vd, vs2, vm # Convert signed integer to float. +---- + +The conversions follow the same rules on exceptional conditions as the +scalar conversion instructions. +The conversions use the dynamic rounding mode in `frm`, except for the `rtz` +variants, which round towards zero. + +NOTE: The `rtz` variants are provided to accelerate truncating conversions +from floating-point to integer, as is common in languages like C and Java. + +==== Widening Floating-Point/Integer Type-Convert Instructions + +A set of conversion instructions is provided to convert between +narrower integer and floating-point datatypes to a type of twice the +width. + +---- +vfwcvt.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer. +vfwcvt.x.f.v vd, vs2, vm # Convert float to double-width signed integer. + +vfwcvt.rtz.xu.f.v vd, vs2, vm # Convert float to double-width unsigned integer, truncating. +vfwcvt.rtz.x.f.v vd, vs2, vm # Convert float to double-width signed integer, truncating. + +vfwcvt.f.xu.v vd, vs2, vm # Convert unsigned integer to double-width float. +vfwcvt.f.x.v vd, vs2, vm # Convert signed integer to double-width float. + +vfwcvt.f.f.v vd, vs2, vm # Convert single-width float to double-width float. +---- + +These instructions have the same constraints on vector register overlap +as other widening instructions (see <>). + +NOTE: A double-width IEEE floating-point value can always represent a +single-width integer exactly. + +NOTE: A double-width IEEE floating-point value can always represent a +single-width IEEE floating-point value exactly. + +NOTE: A full set of floating-point widening conversions is not +supported as single instructions, but any widening conversion can be +implemented as several doubling steps with equivalent results and no +additional exception flags raised. + +==== Narrowing Floating-Point/Integer Type-Convert Instructions + +A set of conversion instructions is provided to convert wider integer +and floating-point datatypes to a type of half the width. + +---- +vfncvt.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer. +vfncvt.x.f.w vd, vs2, vm # Convert double-width float to signed integer. + +vfncvt.rtz.xu.f.w vd, vs2, vm # Convert double-width float to unsigned integer, truncating. +vfncvt.rtz.x.f.w vd, vs2, vm # Convert double-width float to signed integer, truncating. + +vfncvt.f.xu.w vd, vs2, vm # Convert double-width unsigned integer to float. +vfncvt.f.x.w vd, vs2, vm # Convert double-width signed integer to float. + +vfncvt.f.f.w vd, vs2, vm # Convert double-width float to single-width float. +vfncvt.rod.f.f.w vd, vs2, vm # Convert double-width float to single-width float, + # rounding towards odd. +---- + +These instructions have the same constraints on vector register overlap +as other narrowing instructions (see <>). + +NOTE: A full set of floating-point narrowing conversions is not +supported as single instructions. Conversions can be implemented in +a sequence of halving steps. Results are equivalently rounded and +the same exception flags are raised if all but the last halving step +use round-towards-odd (`vfncvt.rod.f.f.w`). Only the final step +should use the desired rounding mode. + +=== Vector Reduction Operations + +Vector reduction operations take a vector register group of elements +and a scalar held in element 0 of a vector register, and perform a +reduction using some binary operator, to produce a scalar result in +element 0 of a vector register. The scalar input and output operands +are held in element 0 of a single vector register, not a vector +register group, so any vector register can be the scalar source or +destination of a vector reduction regardless of LMUL setting. + +The destination vector register can overlap the source operands, +including the mask register. + +NOTE: Vector reductions read and write the scalar operand and result +into element 0 of a vector register instead of a scalar register to +avoid a loss of decoupling with the scalar processor, and to support +future polymorphic use with future types not supported in the scalar +unit. + +Inactive elements from the source vector register group are excluded +from the reduction, but the scalar operand is always included +regardless of the mask values. + +The other elements in the destination vector register ( 0 < index < +VLEN/SEW) are considered the tail and are managed with the current +tail agnostic/undisturbed policy. + +If `vl`=0, no operation is performed and the destination register is +not updated. + +NOTE: This choice of behavior for `vl`=0 reduces implementation +complexity as it is consistent with other operations on vector +register state. For the common case that the source and destination +scalar operand are the same vector register, this behavior also +produces the expected result. For the uncommon case that the source +and destination scalar operand are in different vector registers, this +instruction will not copy the source into the destination when `vl`=0. +However, it is expected that in most of these cases it will be +statically known that `vl` is not zero. In other cases, a check for +`vl`=0 will have to be added to ensure that the source scalar is +copied to the destination (e.g., by explicitly setting `vl`=1 and +performing a register-register copy). + +Traps on vector reduction instructions are always reported with a +`vstart` of 0. Vector reduction operations raise an illegal +instruction exception if `vstart` is non-zero. + +The assembler syntax for a reduction operation is `vredop.vs`, where +the `.vs` suffix denotes the first operand is a vector register group +and the second operand is a scalar stored in element 0 of a vector +register. + +[[sec-vector-integer-reduce]] +==== Vector Single-Width Integer Reduction Instructions + +All operands and results of single-width reduction instructions have +the same SEW width. Overflows wrap around on arithmetic sums. + +---- + # Simple reductions, where [*] denotes all active elements: + vredsum.vs vd, vs2, vs1, vm # vd[0] = sum( vs1[0] , vs2[*] ) + vredmaxu.vs vd, vs2, vs1, vm # vd[0] = maxu( vs1[0] , vs2[*] ) + vredmax.vs vd, vs2, vs1, vm # vd[0] = max( vs1[0] , vs2[*] ) + vredminu.vs vd, vs2, vs1, vm # vd[0] = minu( vs1[0] , vs2[*] ) + vredmin.vs vd, vs2, vs1, vm # vd[0] = min( vs1[0] , vs2[*] ) + vredand.vs vd, vs2, vs1, vm # vd[0] = and( vs1[0] , vs2[*] ) + vredor.vs vd, vs2, vs1, vm # vd[0] = or( vs1[0] , vs2[*] ) + vredxor.vs vd, vs2, vs1, vm # vd[0] = xor( vs1[0] , vs2[*] ) +---- + +[[sec-vector-integer-reduce-widen]] +==== Vector Widening Integer Reduction Instructions + +The unsigned `vwredsumu.vs` instruction zero-extends the SEW-wide +vector elements before summing them, then adds the 2*SEW-width scalar +element, and stores the result in a 2*SEW-width scalar element. + +The `vwredsum.vs` instruction sign-extends the SEW-wide vector +elements before summing them. + +For both `vwredsumu.vs` and `vwredsum.vs`, overflows wrap around. + +---- + # Unsigned sum reduction into double-width accumulator + vwredsumu.vs vd, vs2, vs1, vm # 2*SEW = 2*SEW + sum(zero-extend(SEW)) + + # Signed sum reduction into double-width accumulator + vwredsum.vs vd, vs2, vs1, vm # 2*SEW = 2*SEW + sum(sign-extend(SEW)) +---- + +[[sec-vector-float-reduce]] +==== Vector Single-Width Floating-Point Reduction Instructions + +---- + # Simple reductions. + vfredosum.vs vd, vs2, vs1, vm # Ordered sum + vfredusum.vs vd, vs2, vs1, vm # Unordered sum + vfredmax.vs vd, vs2, vs1, vm # Maximum value + vfredmin.vs vd, vs2, vs1, vm # Minimum value + +---- + +NOTE: Older assembler mnemonic `vfredsum` is retained as alias for `vfredusum`. + +===== Vector Ordered Single-Width Floating-Point Sum Reduction + +The `vfredosum` instruction must sum the floating-point values in +element order, starting with the scalar in `vs1[0]`--that is, it +performs the computation: + +---- + vd[0] = `(((vs1[0] + vs2[0]) + vs2[1]) + ...) + vs2[vl-1]` +---- +where each addition operates identically to the scalar floating-point +instructions in terms of raising exception flags and generating or +propagating special values. + +NOTE: The ordered reduction supports compiler autovectorization, while +the unordered FP sum allows for faster implementations. + +When the operation is masked (`vm=0`), the masked-off elements do not +affect the result or the exception flags. + +NOTE: If no elements are active, no additions are performed, so the scalar in +`vs1[0]` is simply copied to the destination register, without canonicalizing +NaN values and without setting any exception flags. This behavior preserves +the handling of NaNs, exceptions, and rounding when autovectorizing a scalar +summation loop. + +===== Vector Unordered Single-Width Floating-Point Sum Reduction + +The unordered sum reduction instruction, `vfredusum`, provides an +implementation more freedom in performing the reduction. + +The implementation must produce a result equivalent to a reduction tree +composed of binary operator nodes, with the inputs being elements from +the source vector register group (`vs2`) and the source scalar value +(`vs1[0]`). Each operator in the tree accepts two inputs and produces +one result. +Each operator first computes an exact sum as a RISC-V scalar floating-point +addition with infinite exponent range and precision, then converts this exact +sum to a floating-point format with range and precision each at least as great +as the element floating-point format indicated by SEW, rounding using the +currently active floating-point dynamic rounding mode and raising exception +flags as necessary. +A different floating-point range and precision may be chosen for the result of +each operator. +A node where one input is derived only from elements masked-off or beyond the +active vector length may either treat that input as the additive identity of the +appropriate EEW or simply copy the other input to its output. +The rounded result from the root node in the tree is converted (rounded again, +using the dynamic rounding mode) to the standard floating-point format +indicated by SEW. +An implementation +is allowed to add an additional additive identity to the final result. + +The additive identity is +0.0 when rounding down (towards -{inf}) or +-0.0 for all other rounding modes. + +The reduction tree structure must be deterministic for a given value +in `vtype` and `vl`. + +NOTE: As a consequence of this definition, implementations need not propagate +NaN payloads through the reduction tree when no elements are active. In +particular, if no elements are active and the scalar input is NaN, +implementations are permitted to canonicalize the NaN and, if the NaN is +signaling, set the invalid exception flag. Implementations are alternatively +permitted to pass through the original NaN and set no exception flags, as with +`vfredosum`. + +NOTE: The `vfredosum` instruction is a valid implementation of the +`vfredusum` instruction. + +===== Vector Single-Width Floating-Point Max and Min Reductions + +NOTE: Floating-point max and min reductions should return the same +final value and raise the same exception flags regardless of operation +order. + +NOTE: If no elements are active, the scalar in `vs1[0]` is simply copied to +the destination register, without canonicalizing NaN values and without +setting any exception flags. + +[[sec-vector-float-reduce-widen]] +==== Vector Widening Floating-Point Reduction Instructions + +Widening forms of the sum reductions are provided that +read and write a double-width reduction result. + +---- + # Simple reductions. + vfwredosum.vs vd, vs2, vs1, vm # Ordered sum + vfwredusum.vs vd, vs2, vs1, vm # Unordered sum +---- + +NOTE: Older assembler mnemonic `vfwredsum` is retained as alias for `vfwredusum`. + +The reduction of the SEW-width elements is performed as in the +single-width reduction case, with the elements in `vs2` promoted +to 2*SEW bits before adding to the 2*SEW-bit accumulator. + +NOTE: `vfwredosum.vs` handles inactive elements and NaN payloads analogously +to `vfredosum.vs`; `vfwredusum.vs` does so analogously to `vfredusum.vs`. + +[[sec-vector-mask]] +=== Vector Mask Instructions + +Several instructions are provided to help operate on mask values held in +a vector register. + +[[sec-mask-register-logical]] +==== Vector Mask-Register Logical Instructions + +Vector mask-register logical operations operate on mask registers. +Each element in a mask register is a single bit, so these instructions +all operate on single vector registers regardless of the setting of +the `vlmul` field in `vtype`. They do not change the value of +`vlmul`. The destination vector register may be the same as either +source vector register. + +As with other vector instructions, the elements with indices less than +`vstart` are unchanged, and `vstart` is reset to zero after execution. +Vector mask logical instructions are always unmasked, so there are no +inactive elements, and the encodings with `vm=0` are reserved. +Mask elements past `vl`, the tail elements, are +always updated with a tail-agnostic policy. + +---- + vmand.mm vd, vs2, vs1 # vd.mask[i] = vs2.mask[i] && vs1.mask[i] + vmnand.mm vd, vs2, vs1 # vd.mask[i] = !(vs2.mask[i] && vs1.mask[i]) + vmandn.mm vd, vs2, vs1 # vd.mask[i] = vs2.mask[i] && !vs1.mask[i] + vmxor.mm vd, vs2, vs1 # vd.mask[i] = vs2.mask[i] ^^ vs1.mask[i] + vmor.mm vd, vs2, vs1 # vd.mask[i] = vs2.mask[i] || vs1.mask[i] + vmnor.mm vd, vs2, vs1 # vd.mask[i] = !(vs2.mask[i] || vs1.mask[i]) + vmorn.mm vd, vs2, vs1 # vd.mask[i] = vs2.mask[i] || !vs1.mask[i] + vmxnor.mm vd, vs2, vs1 # vd.mask[i] = !(vs2.mask[i] ^^ vs1.mask[i]) +---- + +NOTE: The previous assembler mnemonics `vmandnot` and `vmornot` have +been changed to `vmandn` and `vmorn` to be consistent with the +equivalent scalar instructions. The old `vmandnot` and `vmornot` +mnemonics can be retained as assembler aliases for compatibility. + +Several assembler pseudoinstructions are defined as shorthand for +common uses of mask logical operations: +---- + vmmv.m vd, vs => vmand.mm vd, vs, vs # Copy mask register + vmclr.m vd => vmxor.mm vd, vd, vd # Clear mask register + vmset.m vd => vmxnor.mm vd, vd, vd # Set mask register + vmnot.m vd, vs => vmnand.mm vd, vs, vs # Invert bits +---- + +NOTE: The `vmmv.m` instruction was previously called `vmcpy.m`, but +with new layout it is more consistent to name as a "mv" because bits +are copied without interpretation. The `vmcpy.m` assembler +pseudoinstruction can be retained for compatibility. For +implementations that internally rearrange bits according to EEW, a +`vmmv.m` instruction with same source and destination can be used as +idiom to force an internal reformat into a mask vector. + +The set of eight mask logical instructions can generate any of the 16 +possibly binary logical functions of the two input masks: + +[cols="1,1,1,1,12"] +|=== +4+| inputs | + +| 0 | 0 | 1 | 1 | src1 +| 0 | 1 | 0 | 1 | src2 +|=== + +[cols="1,1,1,1,6,6"] +|=== +4+| output | instruction | pseudoinstruction + +| 0 | 0 | 0 | 0 | vmxor.mm vd, vd, vd | vmclr.m vd +| 1 | 0 | 0 | 0 | vmnor.mm vd, src1, src2 | +| 0 | 1 | 0 | 0 | vmandn.mm vd, src2, src1 | +| 1 | 1 | 0 | 0 | vmnand.mm vd, src1, src1 | vmnot.m vd, src1 +| 0 | 0 | 1 | 0 | vmandn.mm vd, src1, src2 | +| 1 | 0 | 1 | 0 | vmnand.mm vd, src2, src2 | vmnot.m vd, src2 +| 0 | 1 | 1 | 0 | vmxor.mm vd, src1, src2 | +| 1 | 1 | 1 | 0 | vmnand.mm vd, src1, src2 | +| 0 | 0 | 0 | 1 | vmand.mm vd, src1, src2 | +| 1 | 0 | 0 | 1 | vmxnor.mm vd, src1, src2 | +| 0 | 1 | 0 | 1 | vmand.mm vd, src2, src2 | vmmv.m vd, src2 +| 1 | 1 | 0 | 1 | vmorn.mm vd, src2, src1 | +| 0 | 0 | 1 | 1 | vmand.mm vd, src1, src1 | vmmv.m vd, src1 +| 1 | 0 | 1 | 1 | vmorn.mm vd, src1, src2 | +| 0 | 1 | 1 | 1 | vmor.mm vd, src1, src2 | +| 1 | 1 | 1 | 1 | vmxnor.mm vd, vd, vd | vmset.m vd +|=== + +NOTE: The vector mask logical instructions are designed to be easily +fused with a following masked vector operation to effectively expand +the number of predicate registers by moving values into `v0` before +use. + + +==== Vector count population in mask `vcpop.m` + +---- + vcpop.m rd, vs2, vm +---- + +NOTE: This instruction previously had the assembler mnemonic `vpopc.m` +but was renamed to be consistent with the scalar instruction. The +assembler instruction alias `vpopc.m` is being retained for software +compatibility. + +The source operand is a single vector register holding mask register +values as described in Section <>. + +The `vcpop.m` instruction counts the number of mask elements of the +active elements of the vector source mask register that have the value +1 and writes the result to a scalar `x` register. + +The operation can be performed under a mask, in which case only the +masked elements are counted. + +---- + vcpop.m rd, vs2, v0.t # x[rd] = sum_i ( vs2.mask[i] && v0.mask[i] ) +---- + +The `vcpop.m` instruction writes `x[rd]` even if `vl`=0 (with the +value 0, since no mask elements are active). + +Traps on `vcpop.m` are always reported with a `vstart` of 0. The +`vcpop.m` instruction will raise an illegal instruction exception if +`vstart` is non-zero. + +==== `vfirst` find-first-set mask bit + +---- + vfirst.m rd, vs2, vm +---- + +The `vfirst` instruction finds the lowest-numbered active element of +the source mask vector that has the value 1 and writes that element's +index to a GPR. If no active element has the value 1, -1 is written +to the GPR. + +NOTE: Software can assume that any negative value (highest bit set) +corresponds to no element found, as vector lengths will never reach +2^(XLEN-1)^ on any implementation. + +The `vfirst.m` instruction writes `x[rd]` even if `vl`=0 (with the +value -1, since no mask elements are active). + +Traps on `vfirst` are always reported with a `vstart` of 0. The +`vfirst` instruction will raise an illegal instruction exception if +`vstart` is non-zero. + +==== `vmsbf.m` set-before-first mask bit + +---- + vmsbf.m vd, vs2, vm + + # Example + + 7 6 5 4 3 2 1 0 Element number + + 1 0 0 1 0 1 0 0 v3 contents + vmsbf.m v2, v3 + 0 0 0 0 0 0 1 1 v2 contents + + 1 0 0 1 0 1 0 1 v3 contents + vmsbf.m v2, v3 + 0 0 0 0 0 0 0 0 v2 + + 0 0 0 0 0 0 0 0 v3 contents + vmsbf.m v2, v3 + 1 1 1 1 1 1 1 1 v2 + + 1 1 0 0 0 0 1 1 v0 vcontents + 1 0 0 1 0 1 0 0 v3 contents + vmsbf.m v2, v3, v0.t + 0 1 x x x x 1 1 v2 contents +---- + +The `vmsbf.m` instruction takes a mask register as input and writes +results to a mask register. The instruction writes a 1 to all active +mask elements before the first active source element that is a 1, then +writes a 0 to that element and all following active elements. If +there is no set bit in the active elements of the source vector, then +all active elements in the destination are written with a 1. + +The tail elements in the destination mask register are updated under a +tail-agnostic policy. + +Traps on `vmsbf.m` are always reported with a `vstart` of 0. The +`vmsbf` instruction will raise an illegal instruction exception if +`vstart` is non-zero. + +The destination register cannot overlap the source register +and, if masked, cannot overlap the mask register ('v0'). + +==== `vmsif.m` set-including-first mask bit + +The vector mask set-including-first instruction is similar to +set-before-first, except it also includes the element with a set bit. + +---- + vmsif.m vd, vs2, vm + + # Example + + 7 6 5 4 3 2 1 0 Element number + + 1 0 0 1 0 1 0 0 v3 contents + vmsif.m v2, v3 + 0 0 0 0 0 1 1 1 v2 contents + + 1 0 0 1 0 1 0 1 v3 contents + vmsif.m v2, v3 + 0 0 0 0 0 0 0 1 v2 + + 1 1 0 0 0 0 1 1 v0 vcontents + 1 0 0 1 0 1 0 0 v3 contents + vmsif.m v2, v3, v0.t + 1 1 x x x x 1 1 v2 contents +---- + +The tail elements in the destination mask register are updated under a +tail-agnostic policy. + +Traps on `vmsif.m` are always reported with a `vstart` of 0. The +`vmsif` instruction will raise an illegal instruction exception if +`vstart` is non-zero. + +The destination register cannot overlap the source register +and, if masked, cannot overlap the mask register ('v0'). + +==== `vmsof.m` set-only-first mask bit + +The vector mask set-only-first instruction is similar to +set-before-first, except it only sets the first element with a bit +set, if any. + +---- + vmsof.m vd, vs2, vm + + # Example + + 7 6 5 4 3 2 1 0 Element number + + 1 0 0 1 0 1 0 0 v3 contents + vmsof.m v2, v3 + 0 0 0 0 0 1 0 0 v2 contents + + 1 0 0 1 0 1 0 1 v3 contents + vmsof.m v2, v3 + 0 0 0 0 0 0 0 1 v2 + + 1 1 0 0 0 0 1 1 v0 vcontents + 1 1 0 1 0 1 0 0 v3 contents + vmsof.m v2, v3, v0.t + 0 1 x x x x 0 0 v2 contents +---- + +The tail elements in the destination mask register are updated under a +tail-agnostic policy. + +Traps on `vmsof.m` are always reported with a `vstart` of 0. The +`vmsof` instruction will raise an illegal instruction exception if +`vstart` is non-zero. + +The destination register cannot overlap the source register +and, if masked, cannot overlap the mask register ('v0'). + +==== Example using vector mask instructions + +The following is an example of vectorizing a data-dependent exit loop. + +---- +include::example/strcpy.s[lines=4..-1] +---- +---- +include::example/strncpy.s[lines=4..-1] +---- + +==== Vector Iota Instruction + +The `viota.m` instruction reads a source vector mask register and +writes to each element of the destination vector register group the +sum of all the bits of elements in the mask register +whose index is less than the element, e.g., a parallel prefix sum of +the mask values. + +This instruction can be masked, in which case only the enabled +elements contribute to the sum. + +---- + viota.m vd, vs2, vm + + # Example + + 7 6 5 4 3 2 1 0 Element number + + 1 0 0 1 0 0 0 1 v2 contents + viota.m v4, v2 # Unmasked + 2 2 2 1 1 1 1 0 v4 result + + 1 1 1 0 1 0 1 1 v0 contents + 1 0 0 1 0 0 0 1 v2 contents + 2 3 4 5 6 7 8 9 v4 contents + viota.m v4, v2, v0.t # Masked, vtype.vma=0 + 1 1 1 5 1 7 1 0 v4 results +---- + +The result value is zero-extended to fill the destination element if +SEW is wider than the result. If the result value would overflow the +destination SEW, the least-significant SEW bits are retained. + +Traps on `viota.m` are always reported with a `vstart` of 0, and +execution is always restarted from the beginning when resuming after a +trap handler. An illegal instruction exception is raised if `vstart` +is non-zero. + +The destination register group cannot overlap the source register +and, if masked, cannot overlap the mask register (`v0`). + +The `viota.m` instruction can be combined with memory scatter +instructions (indexed stores) to perform vector compress functions. + +---- + # Compact non-zero elements from input memory array to output memory array + # + # size_t compact_non_zero(size_t n, const int* in, int* out) + # { + # size_t i; + # size_t count = 0; + # int *p = out; + # + # for (i=0; i XLEN, the +least-significant XLEN bits are transferred and the upper SEW-XLEN bits are +ignored. If SEW < XLEN, the value is sign-extended to XLEN bits. + +NOTE: `vmv.x.s` performs its operation even if `vstart` {ge} `vl` or `vl`=0. + +The `vmv.s.x` instruction copies the scalar integer register to element 0 of +the destination vector register. If SEW < XLEN, the least-significant bits +are copied and the upper XLEN-SEW bits are ignored. If SEW > XLEN, the value +is sign-extended to SEW bits. The other elements in the destination vector +register ( 0 < index < VLEN/SEW) are treated as tail elements using the current tail agnostic/undisturbed policy. If `vstart` {ge} `vl`, no +operation is performed and the destination register is not updated. + +NOTE: As a consequence, when `vl`=0, no elements are updated in the +destination vector register group, regardless of `vstart`. + +The encodings corresponding to the masked versions (`vm=0`) of `vmv.x.s` +and `vmv.s.x` are reserved. + +==== Floating-Point Scalar Move Instructions + +The floating-point scalar read/write instructions transfer a single +value between a scalar `f` register and element 0 of a vector +register. The instructions ignore LMUL and vector register groups. + +---- +vfmv.f.s rd, vs2 # f[rd] = vs2[0] (rs1=0) +vfmv.s.f vd, rs1 # vd[0] = f[rs1] (vs2=0) +---- + +The `vfmv.f.s` instruction copies a single SEW-wide element from index +0 of the source vector register to a destination scalar floating-point +register. + +NOTE: `vfmv.f.s` performs its operation even if `vstart` {ge} `vl` or `vl`=0. + +The `vfmv.s.f` instruction copies the scalar floating-point register +to element 0 of the destination vector register. The other elements +in the destination vector register ( 0 < index < VLEN/SEW) are treated +as tail elements using the current tail agnostic/undisturbed policy. +If `vstart` {ge} `vl`, no operation is performed and the destination +register is not updated. + +NOTE: As a consequence, when `vl`=0, no elements are updated in the +destination vector register group, regardless of `vstart`. + +The encodings corresponding to the masked versions (`vm=0`) of `vfmv.f.s` +and `vfmv.s.f` are reserved. + +==== Vector Slide Instructions + +The slide instructions move elements up and down a vector register +group. + +NOTE: The slide operations can be implemented much more efficiently +than using the arbitrary register gather instruction. Implementations +may optimize certain OFFSET values for `vslideup` and `vslidedown`. +In particular, power-of-2 offsets may operate substantially faster +than other offsets. + +For all of the `vslideup`, `vslidedown`, `v[f]slide1up`, and +`v[f]slide1down` instructions, if `vstart` {ge} `vl`, the instruction performs no +operation and leaves the destination vector register unchanged. + +NOTE: As a consequence, when `vl`=0, no elements are updated in the +destination vector register group, regardless of `vstart`. + +The tail agnostic/undisturbed policy is followed for tail elements. + +The slide instructions may be masked, with mask element _i_ +controlling whether _destination_ element _i_ is written. The mask +undisturbed/agnostic policy is followed for inactive elements. + +===== Vector Slideup Instructions + +---- + vslideup.vx vd, vs2, rs1, vm # vd[i+x[rs1]] = vs2[i] + vslideup.vi vd, vs2, uimm, vm # vd[i+uimm] = vs2[i] +---- + +For `vslideup`, the value in `vl` specifies the maximum number of destination +elements that are written. The start index (_OFFSET_) for the +destination can be either specified using an unsigned integer in the +`x` register specified by `rs1`, or a 5-bit immediate, zero-extended to XLEN bits. +If XLEN > SEW, _OFFSET_ is _not_ truncated to SEW bits. +Destination elements _OFFSET_ through `vl`-1 are written if unmasked and +if _OFFSET_ < `vl`. + +---- + vslideup behavior for destination elements + + OFFSET is amount to slideup, either from x register or a 5-bit immediate + + 0 < i < max(vstart, OFFSET) Unchanged + max(vstart, OFFSET) <= i < vl vd[i] = vs2[i-OFFSET] if v0.mask[i] enabled + vl <= i < VLMAX Follow tail policy +---- + +The destination vector register group for `vslideup` cannot overlap +the source vector register group, otherwise the instruction encoding +is reserved. + +NOTE: The non-overlap constraint avoids WAR hazards on the +input vectors during execution, and enables restart with non-zero +`vstart`. + +===== Vector Slidedown Instructions + +---- + vslidedown.vx vd, vs2, rs1, vm # vd[i] = vs2[i+x[rs1]] + vslidedown.vi vd, vs2, uimm, vm # vd[i] = vs2[i+uimm] +---- + +For `vslidedown`, the value in `vl` specifies the maximum number of +destination elements that are written. The remaining elements past +`vl` are handled according to the current tail policy (Section +<>). + +The start index (_OFFSET_) for the source can be either specified +using an unsigned integer in the `x` register specified by `rs1`, or a +5-bit immediate, zero-extended to XLEN bits. +If XLEN > SEW, _OFFSET_ is _not_ truncated to SEW bits. + +---- + vslidedown behavior for source elements for element i in slide + 0 <= i+OFFSET < VLMAX src[i] = vs2[i+OFFSET] + VLMAX <= i+OFFSET src[i] = 0 + + vslidedown behavior for destination element i in slide + 0 < i < vstart Unchanged + vstart <= i < vl vd[i] = src[i] if v0.mask[i] enabled + vl <= i < VLMAX Follow tail policy + +---- + +===== Vector Slide1up + +Variants of slide are provided that only move by one element but which +also allow a scalar integer value to be inserted at the vacated +element position. + +---- + vslide1up.vx vd, vs2, rs1, vm # vd[0]=x[rs1], vd[i+1] = vs2[i] +---- + +The `vslide1up` instruction places the `x` register argument at +location 0 of the destination vector register group, provided that +element 0 is active, otherwise the destination element update follows the +current mask agnostic/undisturbed policy. If XLEN < SEW, the value is +sign-extended to SEW bits. If XLEN > SEW, the least-significant bits +are copied over and the high SEW-XLEN bits are ignored. + +The remaining active `vl`-1 elements are copied over from index _i_ in +the source vector register group to index _i_+1 in the destination +vector register group. + +The `vl` register specifies the maximum number of destination vector +register elements updated with source values, and remaining elements +past `vl` are handled according to the current tail policy (Section +<>). + + +---- + vslide1up behavior when vl > 0 + + i < vstart unchanged + 0 = i = vstart vd[i] = x[rs1] if v0.mask[i] enabled + max(vstart, 1) <= i < vl vd[i] = vs2[i-1] if v0.mask[i] enabled + vl <= i < VLMAX Follow tail policy +---- + +The `vslide1up` instruction requires that the destination vector +register group does not overlap the source vector register group. +Otherwise, the instruction encoding is reserved. + +[[sec-vfslide1up]] +===== Vector Floating-Point Slide1up Instruction + +---- + vfslide1up.vf vd, vs2, rs1, vm # vd[0]=f[rs1], vd[i+1] = vs2[i] +---- + +The `vfslide1up` instruction is defined analogously to `vslide1up`, +but sources its scalar argument from an `f` register. + +===== Vector Slide1down Instruction + +The `vslide1down` instruction copies the first `vl`-1 active elements +values from index _i_+1 in the source vector register group to index +_i_ in the destination vector register group. + +The `vl` register specifies the maximum number of destination vector +register elements written with source values, and remaining elements +past `vl` are handled according to the current tail policy (Section +<>). + +---- + vslide1down.vx vd, vs2, rs1, vm # vd[i] = vs2[i+1], vd[vl-1]=x[rs1] +---- + +The `vslide1down` instruction places the `x` register argument at +location `vl`-1 in the destination vector register, provided that +element `vl-1` is active, otherwise the destination element update +follows the current mask agnostic/undisturbed policy. +If XLEN < SEW, the value is sign-extended to SEW bits. If +XLEN > SEW, the least-significant bits are copied over and the high +SEW-XLEN bits are ignored. + +---- + vslide1down behavior + + i < vstart unchanged + vstart <= i < vl-1 vd[i] = vs2[i+1] if v0.mask[i] enabled + vstart <= i = vl-1 vd[vl-1] = x[rs1] if v0.mask[i] enabled + vl <= i < VLMAX Follow tail policy +---- + +NOTE: The `vslide1down` instruction can be used to load values into a +vector register without using memory and without disturbing other +vector registers. This provides a path for debuggers to modify the +contents of a vector register, albeit slowly, with multiple repeated +`vslide1down` invocations. + +[[sec-vfslide1down]] +===== Vector Floating-Point Slide1down Instruction + +---- + vfslide1down.vf vd, vs2, rs1, vm # vd[i] = vs2[i+1], vd[vl-1]=f[rs1] +---- + +The `vfslide1down` instruction is defined analogously to `vslide1down`, +but sources its scalar argument from an `f` register. + +==== Vector Register Gather Instructions + +The vector register gather instructions read elements from a first +source vector register group at locations given by a second source +vector register group. The index values in the second vector are +treated as unsigned integers. The source vector can be read at any +index < VLMAX regardless of `vl`. The maximum number of elements to write to +the destination register is given by `vl`, and the remaining elements +past `vl` are handled according to the current tail policy +(Section <>). The operation can be masked, and the mask +undisturbed/agnostic policy is followed for inactive elements. + +---- +vrgather.vv vd, vs2, vs1, vm # vd[i] = (vs1[i] >= VLMAX) ? 0 : vs2[vs1[i]]; +vrgatherei16.vv vd, vs2, vs1, vm # vd[i] = (vs1[i] >= VLMAX) ? 0 : vs2[vs1[i]]; +---- + +The `vrgather.vv` form uses SEW/LMUL for both the data and +indices. The `vrgatherei16.vv` form uses SEW/LMUL for the data in +`vs2` but EEW=16 and EMUL = (16/SEW)*LMUL for the indices in `vs1`. + +NOTE: When SEW=8, `vrgather.vv` can only reference vector elements +0-255. The `vrgatherei16` form can index 64K elements, and can also +be used to reduce the register capacity needed to hold indices when +SEW > 16. + +If an element index is out of range ( `vs1[i]` {ge} VLMAX ) +then zero is returned for the element value. + +Vector-scalar and vector-immediate forms of the register gather are +also provided. These read one element from the source vector at the +given index, and write this value to the active elements +of the destination vector register. The index value in the scalar +register and the immediate, zero-extended to XLEN bits, are treated as +unsigned integers. If XLEN > SEW, the index value is _not_ truncated +to SEW bits. + +NOTE: These forms allow any vector element to be "splatted" to an entire vector. + +---- +vrgather.vx vd, vs2, rs1, vm # vd[i] = (x[rs1] >= VLMAX) ? 0 : vs2[x[rs1]] +vrgather.vi vd, vs2, uimm, vm # vd[i] = (uimm >= VLMAX) ? 0 : vs2[uimm] +---- + +For any `vrgather` instruction, the destination vector register group +cannot overlap with the source vector register groups, otherwise the +instruction encoding is reserved. + +==== Vector Compress Instruction + +The vector compress instruction allows elements selected by a vector +mask register from a source vector register group to be packed into +contiguous elements at the start of the destination vector register +group. + +---- + vcompress.vm vd, vs2, vs1 # Compress into vd elements of vs2 where vs1 is enabled +---- + +The vector mask register specified by `vs1` indicates which of the +first `vl` elements of vector register group `vs2` should be extracted +and packed into contiguous elements at the beginning of vector +register `vd`. The remaining elements of `vd` are treated as tail +elements according to the current tail policy (Section +<>). + +---- + Example use of vcompress instruction + + 8 7 6 5 4 3 2 1 0 Element number + + 1 1 0 1 0 0 1 0 1 v0 + 8 7 6 5 4 3 2 1 0 v1 + 1 2 3 4 5 6 7 8 9 v2 + vsetivli t0, 9, e8, m1, tu, ma + vcompress.vm v2, v1, v0 + 1 2 3 4 8 7 5 2 0 v2 +---- + +`vcompress` is encoded as an unmasked instruction (`vm=1`). The equivalent +masked instruction (`vm=0`) is reserved. + +The destination vector register group cannot overlap the source vector +register group or the source mask register, otherwise the instruction +encoding is reserved. + +A trap on a `vcompress` instruction is always reported with a +`vstart` of 0. Executing a `vcompress` instruction with a non-zero +`vstart` raises an illegal instruction exception. + +NOTE: Although possible, `vcompress` is one of the more difficult +instructions to restart with a non-zero `vstart`, so assumption is +implementations will choose not do that but will instead restart from +element 0. This does mean elements in destination register after +`vstart` will already have been updated. + +===== Synthesizing `vdecompress` + +There is no inverse `vdecompress` provided, as this operation can be +readily synthesized using iota and a masked vrgather: + +---- + Desired functionality of 'vdecompress' + 7 6 5 4 3 2 1 0 # vid + + e d c b a # packed vector of 5 elements + 1 0 0 1 1 1 0 1 # mask vector of 8 elements + p q r s t u v w # destination register before vdecompress + + e q r d c b v a # result of vdecompress +---- + +---- + # v0 holds mask + # v1 holds packed data + # v11 holds input expanded vector and result + viota.m v10, v0 # Calc iota from mask in v0 + vrgather.vv v11, v1, v10, v0.t # Expand into destination +---- +---- + p q r s t u v w # v11 destination register + e d c b a # v1 source vector + 1 0 0 1 1 1 0 1 # v0 mask vector + + 4 4 4 3 2 1 1 0 # v10 result of viota.m + e q r d c b v a # v11 destination after vrgather using viota.m under mask +---- + +==== Whole Vector Register Move + +The `vmvr.v` instructions copy whole vector registers (i.e., all +VLEN bits) and can copy whole vector register groups. The `nr` value +in the opcode is the number of individual vector registers, NREG, to +copy. The instructions operate as if EEW=SEW, EMUL = NREG, effective +length `evl`= EMUL * VLEN/SEW. + +NOTE: These instructions are intended to aid compilers to shuffle +vector registers without needing to know or change `vl` or `vtype`. + +NOTE: The usual property that no elements are written if `vstart` {ge} `vl` +does not apply to these instructions. +Instead, no elements are written if `vstart` {ge} `evl`. + +NOTE: If `vd` is equal to `vs2` the instruction is an architectural +NOP, but is treated as a hint to implementations that rearrange data +internally that the register group will next be accessed with an EEW +equal to SEW. + +The instruction is encoded as an OPIVI instruction. The number of +vector registers to copy is encoded in the low three bits of the +`simm` field (`simm[2:0]`) using the same encoding as the `nf[2:0]` field for memory +instructions (Figure <>), i.e., `simm[2:0]` = NREG-1. + +The value of NREG must be 1, 2, 4, or 8, and values of `simm[4:0]` +other than 0, 1, 3, and 7 are reserved. + +NOTE: A future extension may support other numbers of registers to be moved. + +NOTE: The instruction uses the same funct6 encoding as the `vsmul` +instruction but with an immediate operand, and only the unmasked +version (`vm=1`). This encoding is chosen as it is close to the +related `vmerge` encoding, and it is unlikely the `vsmul` instruction +would benefit from an immediate form. + +---- + vmvr.v vd, vs2 # General form + + vmv1r.v v1, v2 # Copy v1=v2 + vmv2r.v v10, v12 # Copy v10=v12; v11=v13 + vmv4r.v v4, v8 # Copy v4=v8; v5=v9; v6=v10; v7=v11 + vmv8r.v v0, v8 # Copy v0=v8; v1=v9; ...; v7=v15 +---- + +The source and destination vector register numbers must be aligned +appropriately for the vector register group size, and encodings with +other vector register numbers are reserved. + +NOTE: A future extension may relax the vector register alignment +restrictions. + +=== Exception Handling + +On a trap during a vector instruction (caused by either a synchronous +exception or an asynchronous interrupt), the existing `*epc` CSR is +written with a pointer to the trapping vector instruction, while the +`vstart` CSR contains the element index on which the trap was +taken. + +NOTE: We chose to add a `vstart` CSR to allow resumption of a +partially executed vector instruction to reduce interrupt latencies +and to simplify forward-progress guarantees. This is similar to the +scheme in the IBM 3090 vector facility. To ensure forward progress +without the `vstart` CSR, implementations would have to guarantee an +entire vector instruction can always complete atomically without +generating a trap. This is particularly difficult to ensure in the +presence of strided or scatter/gather operations and demand-paged +virtual memory. + +==== Precise vector traps + +NOTE: We assume most supervisor-mode environments with demand-paging +will require precise vector traps. + +Precise vector traps require that: + +. all instructions older than the trapping vector instruction have committed their results +. no instructions newer than the trapping vector instruction have altered architectural state +. any operations within the trapping vector instruction affecting result elements preceding the index in the `vstart` CSR have committed their results +. no operations within the trapping vector instruction affecting elements at or following the `vstart` CSR have altered architectural state except if restarting and completing the affected vector instruction will nevertheless produce the correct final state. + +We relax the last requirement to allow elements following `vstart` to +have been updated at the time the trap is reported, provided that +re-executing the instruction from the given `vstart` will correctly +overwrite those elements. + +In idempotent memory regions, vector store instructions may have +updated elements in memory past the element causing a synchronous +trap. Non-idempotent memory regions must not have been updated for +indices equal to or greater than the element that caused a synchronous +trap during a vector store instruction. + +Except where noted above, vector instructions are allowed to overwrite +their inputs, and so in most cases, the vector instruction restart +must be from the `vstart` element index. However, there are a number of +cases where this overwrite is prohibited to enable execution of the +vector instructions to be idempotent and hence restartable from an +earlier index location. + +Implementations must ensure forward progress can be eventually +guaranteed for the element or segment reported by `vstart`. + +==== Imprecise vector traps + +Imprecise vector traps are traps that are not precise. In particular, +instructions newer than `*epc` may have committed results, and +instructions older than `*epc` may have not completed execution. +Imprecise traps are primarily intended to be used in situations where +reporting an error and terminating execution is the appropriate +response. + +NOTE: A profile might specify that interrupts are precise while other +traps are imprecise. We assume many embedded implementations will +generate only imprecise traps for vector instructions on fatal errors, +as they will not require resumable traps. + +Imprecise traps shall report the faulting element in `vstart` for +traps caused by synchronous vector exceptions. + +There is no support for imprecise traps in the current standard extensions. + +==== Selectable precise/imprecise traps + +Some profiles may choose to provide a privileged mode bit to select +between precise and imprecise vector traps. Imprecise mode would run +at high-performance but possibly make it difficult to discern error +causes, while precise mode would run more slowly, but support +debugging of errors albeit with a possibility of not experiencing the +same errors as in imprecise mode. + +This mechanism is not defined in the current standard extensions. + +==== Swappable traps + +Another trap mode can support swappable state in the vector unit, +where on a trap, special instructions can save and restore the vector +unit microarchitectural state, to allow execution to continue +correctly around imprecise traps. + +This mechanism is not defined in the current standard extensions. + +NOTE: A future extension might define a standard way of saving and +restoring opaque microarchitectural state from a vector unit +implementation to support context switching with imprecise traps. + +[[sec-vector-extensions]] +=== Standard Vector Extensions + +This section describes the standard vector extensions to be proposed +for public review. A set of smaller extensions intended for embedded +use are named with a "Zve" prefix, while a larger vector extension +designed for application processors is named as a single-letter V +extension. A set of vector length extension names with prefix "Zvl" +are also provided. + +The initial vector extensions are designed to act as a base for +additional vector extensions in various domains, including +cryptography and machine learning. + +==== Zvl*: Minimum Vector Length Standard Extensions + +All standard vector extensions have a minimum required VLEN as +described below. A set of vector length extensions are provided to +increase the minimum vector length of a vector extension. + +NOTE: The vector length extensions can be used to either specify +additional software or architecture profile requirements, or to +advertise hardware capabilities. + +.Vector length extensions +[cols="1,1"] +[%autowidth] +|=== +| Extension | Minimum VLEN + +| Zvl32b | 32 +| Zvl64b | 64 +| Zvl128b | 128 +| Zvl256b | 256 +| Zvl512b | 512 +| Zvl1024b | 1024 +|=== + +NOTE: Longer vector length extensions should follow the same pattern. + +NOTE: Every vector length extension effectively includes all shorter +vector length extensions. + +NOTE: The syntax for extension names is being revised, and these names +are subject to change. The trailing "b" will be required to +disambiguate numeric fields from version numbers. + +NOTE: Explicit use of the Zvl32b extension string is not required for +any standard vector extension as they all effectively mandate at least +this minimum, but the string can be useful when stating hardware +capabilities. + +==== Zve*: Vector Extensions for Embedded Processors + +The following five standard extensions are defined to provide varying +degrees of vector support and are intended for use with embedded +processors. Any of these extensions can be added to base ISAs with +XLEN=32 or XLEN=64. The table lists the minimum VLEN and supported +EEWs for each extension as well as what floating-point types are +supported. + +.Embedded vector extensions +[cols="1,1,2,1,1"] +[%autowidth] +|=== +| Extension | Minimum VLEN | Supported EEW | FP32 | FP64 + +| Zve32x | 32 | 8, 16, 32 | N | N +| Zve32f | 32 | 8, 16, 32 | Y | N +| Zve64x | 64 | 8, 16, 32, 64 | N | N +| Zve64f | 64 | 8, 16, 32, 64 | Y | N +| Zve64d | 64 | 8, 16, 32, 64 | Y | Y +|=== + +The Zve32f and Zve64x extensions depend on the Zve32x extension. +The Zve64f extension depends on the Zve32f and Zve64x extensions. +The Zve64d extension depends on the Zve64f extension. + +All Zve* extensions have precise traps. + +NOTE: There is currently no standard support for handling imprecise +traps, so standard extensions have to provide precise traps. + +All Zve* extensions provide support for EEW of 8, 16, and 32, and +Zve64* extensions also support EEW of 64. + +All Zve* extensions support the vector configuration instructions +(Section <>). + +All Zve* extensions support all vector load and store instructions +(Section <>), except Zve64* extensions do not +support EEW=64 for index values when XLEN=32. + +All Zve* extensions support all vector integer instructions (Section +<>), except that the `vmulh` integer multiply +variants that return the high word of the product (`vmulh.vv`, +`vmulh.vx`, `vmulhu.vv`, `vmulhu.vx`, `vmulhsu.vv`, `vmulhsu.vx`) are +not included for EEW=64 in Zve64*. + +NOTE: Producing the high-word of a product can take substantial +additional gates for large EEW. + +All Zve* extensions support all vector fixed-point arithmetic +instructions (<>), except that `vsmul.vv` and +`vsmul.vx` are not included in EEW=64 in Zve64*. + +NOTE: As with `vmulh`, `vsmul` requires a large amount of additional +logic, and 64-bit fixed-point multiplies are relatively rare. + +All Zve* extensions support all vector integer single-width and +widening reduction operations (Sections <>, +<>). + +All Zve* extensions support all vector mask instructions (Section +<>). + +All Zve* extensions support all vector permutation instructions +(Section <>), except that Zve32x and Zve64x +do not include those with floating-point operands, and Zve64f does not include those +with EEW=64 floating-point operands. + +The Zve32f and Zve64f extensions depend upon the F extension, +and implement all +vector floating-point instructions (Section <>) for +floating-point operands with EEW=32. Vector single-width floating-point reduction +operations (<>) for EEW=32 are supported. + +The Zve64d extension depends upon the D extension, +and implements all vector +floating-point instructions (Section <>) for +floating-point operands with EEW=32 or EEW=64 (including widening +instructions and conversions between FP32 and FP64). Vector +single-width floating-point reductions (<>) +for EEW=32 and EEW=64 are supported as well as widening reductions +from FP32 to FP64. + +==== V: Vector Extension for Application Processors + +The single-letter V extension is intended for use in application +processor profiles. + +The `misa.v` bit is set for implementations providing `misa` and +supporting V. + +The V vector extension has precise traps. + +The V vector extension depends upon the Zvl128b and Zve64d extensions. + +NOTE: The value of 128 was chosen as a compromise for application +processors. Providing a larger VLEN allows stripmining code to be +elided in some cases for short vectors, but also increases the size of +the minimum implementation. Note that larger LMUL can be used to +avoid stripmining for longer known-size application vectors at the +cost of having fewer available vector register groups. For example, an +LMUL of 8 allows vectors of up to sixteen 64-bit elements to be +processed without stripmining using four vector register groups. + +The V extension supports EEW of 8, 16, and 32, and 64. + +The V extension supports the vector configuration instructions +(Section <>). + +The V extension supports all vector load and store instructions +(Section <>), except the V extension does not +support EEW=64 for index values when XLEN=32. + +The V extension supports all vector integer instructions (Section +<>). + +The V extension supports all vector fixed-point arithmetic +instructions (<>). + +The V extension supports all vector integer single-width and +widening reduction operations (Sections <>, +<>). + +The V extension supports all vector mask instructions (Section +<>). + +The V extension supports all vector permutation instructions (Section +<>). + +The V extension depends upon the F and D +extensions, and implements all vector floating-point instructions +(Section <>) for floating-point operands with EEW=32 +or EEW=64 (including widening instructions and conversions between +FP32 and FP64). Vector single-width floating-point reductions +(<>) for EEW=32 and EEW=64 are supported as +well as widening reductions from FP32 to FP64. + +NOTE: As is the case with other RISC-V extensions, it is valid to +include overlapping extensions in the same ISA string. For example, +RV64GCV and RV64GCV_Zve64f are both valid and equivalent ISA strings, +as is RV64GCV_Zve64f_Zve32x_Zvl128b. + +==== Zvfhmin: Vector Extension for Minimal Half-Precision Floating-Point + +The Zvfhmin extension provides minimal support for vectors of IEEE 754-2008 +binary16 values, adding conversions to and from binary32. +When the Zvfhmin extension is implemented, the `vfwcvt.f.f.v` and +`vfncvt.f.f.w` instructions become defined when SEW=16. +The EEW=16 floating-point operands of these instructions use the binary16 +format. + +The Zvfhmin extension depends on the Zve32f extension. + +==== Zvfh: Vector Extension for Half-Precision Floating-Point + +The Zvfh extension provides support for vectors of IEEE 754-2008 +binary16 values. +When the Zvfh extension is implemented, all instructions in Sections +<>, <>, +<>, <>, +<>, and <> +become defined when SEW=16. +The EEW=16 floating-point operands of these instructions use the binary16 +format. + +Additionally, conversions between 8-bit integers and binary16 values are +provided. The floating-point-to-integer narrowing conversions +(`vfncvt[.rtz].x[u].f.w`) and integer-to-floating-point +widening conversions (`vfwcvt.f.x[u].v`) become defined when SEW=8. + +The Zvfh extension depends on the Zve32f and Zfhmin extensions. + +NOTE: Requiring basic scalar half-precision support makes Zvfh's +vector-scalar instructions substantially more useful. +We considered requiring more complete scalar half-precision support, but we +reasoned that, for many half-precision vector workloads, performing the scalar +computation in single-precision will suffice. + +=== Vector Instruction Listing + +include::inst-table.adoc[] + -- cgit v1.1 From bebbad41087bbfb713c15db173cc96daf2bd1a81 Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Tue, 1 Aug 2023 13:15:58 -0400 Subject: Setting up the inclusion of Vector. Added Vector and all supporting files. --- src/calling-convention.adoc | 29 +++++ src/example/memcpy.s | 17 +++ src/example/saxpy.s | 29 +++++ src/example/sgemm.S | 221 ++++++++++++++++++++++++++++++++++ src/example/strcmp.s | 34 ++++++ src/example/strcpy.s | 20 +++ src/example/strlen.s | 22 ++++ src/example/strncpy.s | 36 ++++++ src/example/vvaddint32.s | 22 ++++ src/fraclmul.adoc | 175 +++++++++++++++++++++++++++ src/images/wavedrom/inst-table.adoc | 209 ++++++++++++++++++++++++++++++++ src/images/wavedrom/valu-format.adoc | 97 +++++++++++++++ src/images/wavedrom/vcfg-format.adoc | 44 +++++++ src/images/wavedrom/vfrec7.adoc | 136 +++++++++++++++++++++ src/images/wavedrom/vfrsqrt7.adoc | 139 +++++++++++++++++++++ src/images/wavedrom/vmem-format.adoc | 102 ++++++++++++++++ src/images/wavedrom/vtype-format.adoc | 27 +++++ src/v-st-ext.adoc | 22 ++-- src/vector-examples.adoc | 123 +++++++++++++++++++ 19 files changed, 1493 insertions(+), 11 deletions(-) create mode 100644 src/calling-convention.adoc create mode 100644 src/example/memcpy.s create mode 100644 src/example/saxpy.s create mode 100644 src/example/sgemm.S create mode 100644 src/example/strcmp.s create mode 100644 src/example/strcpy.s create mode 100644 src/example/strlen.s create mode 100644 src/example/strncpy.s create mode 100644 src/example/vvaddint32.s create mode 100644 src/fraclmul.adoc create mode 100644 src/images/wavedrom/inst-table.adoc create mode 100644 src/images/wavedrom/valu-format.adoc create mode 100644 src/images/wavedrom/vcfg-format.adoc create mode 100644 src/images/wavedrom/vfrec7.adoc create mode 100644 src/images/wavedrom/vfrsqrt7.adoc create mode 100644 src/images/wavedrom/vmem-format.adoc create mode 100644 src/images/wavedrom/vtype-format.adoc create mode 100644 src/vector-examples.adoc diff --git a/src/calling-convention.adoc b/src/calling-convention.adoc new file mode 100644 index 0000000..9ea5505 --- /dev/null +++ b/src/calling-convention.adoc @@ -0,0 +1,29 @@ +[appendix] +== Calling Convention (Not authoritative - Placeholder Only) + +NOTE: This Appendix is only a placeholder to help explain the +conventions used in the code examples, and is not considered frozen or +part of the ratification process. The official RISC-V psABI document +is being expanded to specify the vector calling conventions. + +In the RISC-V psABI, the vector registers `v0`-`v31` are all caller-saved. +The `vl` and `vtype` CSRs are also caller-saved. + +Procedures may assume that `vstart` is zero upon entry. Procedures may +assume that `vstart` is zero upon return from a procedure call. + +NOTE: Application software should normally not write `vstart` explicitly. +Any procedure that does explicitly write `vstart` to a nonzero value must +zero `vstart` before either returning or calling another procedure. + +The `vxrm` and `vxsat` fields of `vcsr` have thread storage duration. + +Executing a system call causes all caller-saved vector registers +(`v0`-`v31`, `vl`, `vtype`) and `vstart` to become unspecified. + +NOTE: This scheme allows system calls that cause context switches to avoid +saving and later restoring the vector registers. + +NOTE: Most OSes will choose to either leave these registers intact or reset +them to their initial state to avoid leaking information across process +boundaries. diff --git a/src/example/memcpy.s b/src/example/memcpy.s new file mode 100644 index 0000000..5f6318a --- /dev/null +++ b/src/example/memcpy.s @@ -0,0 +1,17 @@ + .text + .balign 4 + .global memcpy + # void *memcpy(void* dest, const void* src, size_t n) + # a0=dest, a1=src, a2=n + # + memcpy: + mv a3, a0 # Copy destination + loop: + vsetvli t0, a2, e8, m8, ta, ma # Vectors of 8b + vle8.v v0, (a1) # Load bytes + add a1, a1, t0 # Bump pointer + sub a2, a2, t0 # Decrement count + vse8.v v0, (a3) # Store bytes + add a3, a3, t0 # Bump pointer + bnez a2, loop # Any more? + ret # Return diff --git a/src/example/saxpy.s b/src/example/saxpy.s new file mode 100644 index 0000000..de7f224 --- /dev/null +++ b/src/example/saxpy.s @@ -0,0 +1,29 @@ + .text + .balign 4 + .global saxpy +# void +# saxpy(size_t n, const float a, const float *x, float *y) +# { +# size_t i; +# for (i=0; iThis Inner Loop Header: Depth=1 + add s9, a2, s6 + vsetvli s1, zero, e8,m1,ta,mu + vle8.v v25, (s9) + add s1, a3, s6 + vle8.v v26, (s1) + vadd.vv v25, v26, v25 + add s1, a1, s6 + vse8.v v25, (s1) + add s9, a5, s10 + vsetvli s1, zero, e64,m8,ta,mu + vle64.v v8, (s9) + add s1, a6, s10 + vle64.v v16, (s1) + add s1, a7, s10 + vle64.v v24, (s1) + add s1, s3, s10 + vle64.v v0, (s1) + sd a0, -112(s0) + ld a0, -128(s0) + vs8r.v v0, (a0) # Spill LMUL=8 + add s9, t6, s10 + add s11, t5, s10 + add ra, t2, s10 + add s1, t3, s10 + vle64.v v0, (s9) + ld s9, -136(s0) + vs8r.v v0, (s9) # Spill LMUL=8 + vle64.v v0, (s11) + ld s9, -144(s0) + vs8r.v v0, (s9) # Spill LMUL=8 + vle64.v v0, (ra) + ld s9, -160(s0) + vs8r.v v0, (s9) # Spill LMUL=8 + vle64.v v0, (s1) + ld s1, -152(s0) + vs8r.v v0, (s1) # Spill LMUL=8 + vadd.vv v16, v16, v8 + ld s1, -128(s0) + vl8r.v v8, (s1) # Reload LMUL=8 + vadd.vv v8, v8, v24 + ld s1, -136(s0) + vl8r.v v24, (s1) # Reload LMUL=8 + ld s1, -144(s0) + vl8r.v v0, (s1) # Reload LMUL=8 + vadd.vv v24, v0, v24 + ld s1, -128(s0) + vs8r.v v24, (s1) # Spill LMUL=8 + ld s1, -152(s0) + vl8r.v v0, (s1) # Reload LMUL=8 + ld s1, -160(s0) + vl8r.v v24, (s1) # Reload LMUL=8 + vadd.vv v0, v0, v24 + add s1, a4, s10 + vse64.v v16, (s1) + add s1, s2, s10 + vse64.v v8, (s1) + vadd.vv v8, v8, v16 + add s1, t4, s10 + ld s9, -128(s0) + vl8r.v v16, (s9) # Reload LMUL=8 + vse64.v v16, (s1) + add s9, t0, s10 + vadd.vv v8, v8, v16 + vle64.v v16, (s9) + add s1, t1, s10 + vse64.v v0, (s1) + vadd.vv v8, v8, v0 + vsll.vi v16, v16, 1 + vadd.vv v8, v8, v16 + vse64.v v8, (s9) + add s6, s6, s7 + add s10, s10, s8 + bne s6, s4, .LBB0_4 +---- + +If instead of using LMUL=1 for the 8-bit computation, the compiler is allowed +to use a fractional LMUL=1/2, then the 64-bit computations can be performed +using LMUL=4 (note that the same ratio of 64-bit elements and 8-bit elements is +preserved as in the previous example). Now the compiler has 8 available +registers to perform register allocation, resulting in no spill code, as +shown in the loop below: + +---- +.LBB0_4: # %vector.body + # =>This Inner Loop Header: Depth=1 + add s9, a2, s6 + vsetvli s1, zero, e8,mf2,ta,mu // LMUL=1/2 ! + vle8.v v25, (s9) + add s1, a3, s6 + vle8.v v26, (s1) + vadd.vv v25, v26, v25 + add s1, a1, s6 + vse8.v v25, (s1) + add s9, a5, s10 + vsetvli s1, zero, e64,m4,ta,mu // LMUL=4 + vle64.v v28, (s9) + add s1, a6, s10 + vle64.v v8, (s1) + vadd.vv v28, v8, v28 + add s1, a7, s10 + vle64.v v8, (s1) + add s1, s3, s10 + vle64.v v12, (s1) + add s1, t6, s10 + vle64.v v16, (s1) + add s1, t5, s10 + vle64.v v20, (s1) + add s1, a4, s10 + vse64.v v28, (s1) + vadd.vv v8, v12, v8 + vadd.vv v12, v20, v16 + add s1, t2, s10 + vle64.v v16, (s1) + add s1, t3, s10 + vle64.v v20, (s1) + add s1, s2, s10 + vse64.v v8, (s1) + add s9, t4, s10 + vadd.vv v16, v20, v16 + add s11, t0, s10 + vle64.v v20, (s11) + vse64.v v12, (s9) + add s1, t1, s10 + vse64.v v16, (s1) + vsll.vi v20, v20, 1 + vadd.vv v28, v8, v28 + vadd.vv v28, v28, v12 + vadd.vv v28, v28, v16 + vadd.vv v28, v28, v20 + vse64.v v28, (s11) + add s6, s6, s7 + add s10, s10, s8 + bne s6, s4, .LBB0_4 +---- diff --git a/src/images/wavedrom/inst-table.adoc b/src/images/wavedrom/inst-table.adoc new file mode 100644 index 0000000..1c3511b --- /dev/null +++ b/src/images/wavedrom/inst-table.adoc @@ -0,0 +1,209 @@ + +// [cols="4,1,1,1,8,4,1,1,8,4,1,1,8"] +|=== +5+| Integer 4+| Integer 4+| FP + +| funct3 | | | | | funct3 | | | | funct3 | | | +| OPIVV |V| | | | OPMVV |V| | | OPFVV |V| | +| OPIVX | |X| | | OPMVX | |X| | OPFVF | |F| +| OPIVI | | |I| | | | | | | | | +|=== + +// [cols="4,1,1,1,8,4,1,1,8,4,1,1,8"] +|=== +5+| funct6 4+| funct6 4+| funct6 + +| 000000 |V|X|I| vadd | 000000 |V| | vredsum | 000000 |V|F| vfadd +| 000001 | | | | | 000001 |V| | vredand | 000001 |V| | vfredusum +| 000010 |V|X| | vsub | 000010 |V| | vredor | 000010 |V|F| vfsub +| 000011 | |X|I| vrsub | 000011 |V| | vredxor | 000011 |V| | vfredosum +| 000100 |V|X| | vminu | 000100 |V| | vredminu | 000100 |V|F| vfmin +| 000101 |V|X| | vmin | 000101 |V| | vredmin | 000101 |V| | vfredmin +| 000110 |V|X| | vmaxu | 000110 |V| | vredmaxu | 000110 |V|F| vfmax +| 000111 |V|X| | vmax | 000111 |V| | vredmax | 000111 |V| | vfredmax +| 001000 | | | | | 001000 |V|X| vaaddu | 001000 |V|F| vfsgnj +| 001001 |V|X|I| vand | 001001 |V|X| vaadd | 001001 |V|F| vfsgnjn +| 001010 |V|X|I| vor | 001010 |V|X| vasubu | 001010 |V|F| vfsgnjx +| 001011 |V|X|I| vxor | 001011 |V|X| vasub | 001011 | | | +| 001100 |V|X|I| vrgather | 001100 | | | | 001100 | | | +| 001101 | | | | | 001101 | | | | 001101 | | | +| 001110 | |X|I| vslideup | 001110 | |X| vslide1up | 001110 | |F| vfslide1up +| 001110 |V| | |vrgatherei16| | | | | | | | +| 001111 | |X|I| vslidedown | 001111 | |X| vslide1down | 001111 | |F| vfslide1down +|=== + +// [cols="4,1,1,1,8,4,1,1,8,4,1,1,8"] +|=== +5+| funct6 4+| funct6 4+| funct6 + +| 010000 |V|X|I| vadc | 010000 |V| | VWXUNARY0 | 010000 |V| | VWFUNARY0 +| | | | | | 010000 | |X| VRXUNARY0 | 010000 | |F| VRFUNARY0 +| 010001 |V|X|I| vmadc | 010001 | | | | 010001 | | | +| 010010 |V|X| | vsbc | 010010 |V| | VXUNARY0 | 010010 |V| | VFUNARY0 +| 010011 |V|X| | vmsbc | 010011 | | | | 010011 |V| | VFUNARY1 +| 010100 | | | | | 010100 |V| | VMUNARY0 | 010100 | | | +| 010101 | | | | | 010101 | | | | 010101 | | | +| 010110 | | | | | 010110 | | | | 010110 | | | +| 010111 |V|X|I| vmerge/vmv | 010111 |V| | vcompress | 010111 | |F| vfmerge/vfmv +| 011000 |V|X|I| vmseq | 011000 |V| | vmandn | 011000 |V|F| vmfeq +| 011001 |V|X|I| vmsne | 011001 |V| | vmand | 011001 |V|F| vmfle +| 011010 |V|X| | vmsltu | 011010 |V| | vmor | 011010 | | | +| 011011 |V|X| | vmslt | 011011 |V| | vmxor | 011011 |V|F| vmflt +| 011100 |V|X|I| vmsleu | 011100 |V| | vmorn | 011100 |V|F| vmfne +| 011101 |V|X|I| vmsle | 011101 |V| | vmnand | 011101 | |F| vmfgt +| 011110 | |X|I| vmsgtu | 011110 |V| | vmnor | 011110 | | | +| 011111 | |X|I| vmsgt | 011111 |V| | vmxnor | 011111 | |F| vmfge +|=== + +// [cols="4,1,1,1,8,4,1,1,8,4,1,1,8"] +|=== +5+| funct6 4+| funct6 4+| funct6 + +| 100000 |V|X|I| vsaddu | 100000 |V|X| vdivu | 100000 |V|F| vfdiv +| 100001 |V|X|I| vsadd | 100001 |V|X| vdiv | 100001 | |F| vfrdiv +| 100010 |V|X| | vssubu | 100010 |V|X| vremu | 100010 | | | +| 100011 |V|X| | vssub | 100011 |V|X| vrem | 100011 | | | +| 100100 | | | | | 100100 |V|X| vmulhu | 100100 |V|F| vfmul +| 100101 |V|X|I| vsll | 100101 |V|X| vmul | 100101 | | | +| 100110 | | | | | 100110 |V|X| vmulhsu | 100110 | | | +| 100111 |V|X| | vsmul | 100111 |V|X| vmulh | 100111 | |F| vfrsub +| 100111 | | |I| vmvr | | | | | | | | +| 101000 |V|X|I| vsrl | 101000 | | | | 101000 |V|F| vfmadd +| 101001 |V|X|I| vsra | 101001 |V|X| vmadd | 101001 |V|F| vfnmadd +| 101010 |V|X|I| vssrl | 101010 | | | | 101010 |V|F| vfmsub +| 101011 |V|X|I| vssra | 101011 |V|X| vnmsub | 101011 |V|F| vfnmsub +| 101100 |V|X|I| vnsrl | 101100 | | | | 101100 |V|F| vfmacc +| 101101 |V|X|I| vnsra | 101101 |V|X| vmacc | 101101 |V|F| vfnmacc +| 101110 |V|X|I| vnclipu | 101110 | | | | 101110 |V|F| vfmsac +| 101111 |V|X|I| vnclip | 101111 |V|X| vnmsac | 101111 |V|F| vfnmsac +|=== + +// [cols="4,1,1,1,8,4,1,1,8,4,1,1,8"] +|=== +5+| funct6 4+| funct6 4+| funct6 + +| 110000 |V| | | vwredsumu | 110000 |V|X| vwaddu | 110000 |V|F| vfwadd +| 110001 |V| | | vwredsum | 110001 |V|X| vwadd | 110001 |V| | vfwredusum +| 110010 | | | | | 110010 |V|X| vwsubu | 110010 |V|F| vfwsub +| 110011 | | | | | 110011 |V|X| vwsub | 110011 |V| | vfwredosum +| 110100 | | | | | 110100 |V|X| vwaddu.w | 110100 |V|F| vfwadd.w +| 110101 | | | | | 110101 |V|X| vwadd.w | 110101 | | | +| 110110 | | | | | 110110 |V|X| vwsubu.w | 110110 |V|F| vfwsub.w +| 110111 | | | | | 110111 |V|X| vwsub.w | 110111 | | | +| 111000 | | | | | 111000 |V|X| vwmulu | 111000 |V|F| vfwmul +| 111001 | | | | | 111001 | | | | 111001 | | | +| 111010 | | | | | 111010 |V|X| vwmulsu | 111010 | | | +| 111011 | | | | | 111011 |V|X| vwmul | 111011 | | | +| 111100 | | | | | 111100 |V|X| vwmaccu | 111100 |V|F| vfwmacc +| 111101 | | | | | 111101 |V|X| vwmacc | 111101 |V|F| vfwnmacc +| 111110 | | | | | 111110 | |X| vwmaccus | 111110 |V|F| vfwmsac +| 111111 | | | | | 111111 |V|X| vwmaccsu | 111111 |V|F| vfwnmsac +|=== + +<<< + +.VRXUNARY0 encoding space +[cols="2,14"] +|=== +| vs2 | + +| 00000 | vmv.s.x +|=== + +.VWXUNARY0 encoding space +[cols="2,14"] +|=== +| vs1 | + +| 00000 | vmv.x.s +| 10000 | vcpop +| 10001 | vfirst +|=== + +.VXUNARY0 encoding space +[cols="2,14"] +|=== +| vs1 | + +| 00010 | vzext.vf8 +| 00011 | vsext.vf8 +| 00100 | vzext.vf4 +| 00101 | vsext.vf4 +| 00110 | vzext.vf2 +| 00111 | vsext.vf2 +|=== + +.VRFUNARY0 encoding space +[cols="2,14"] +|=== +| vs2 | + +| 00000 | vfmv.s.f +|=== + +.VWFUNARY0 encoding space +[cols="2,14"] +|=== +| vs1 | + +| 00000 | vfmv.f.s +|=== + +.VFUNARY0 encoding space +[cols="2,14"] +|=== +| vs1 | name + +2+| single-width converts +| 00000 | vfcvt.xu.f.v +| 00001 | vfcvt.x.f.v +| 00010 | vfcvt.f.xu.v +| 00011 | vfcvt.f.x.v +| 00110 | vfcvt.rtz.xu.f.v +| 00111 | vfcvt.rtz.x.f.v +| | +2+| widening converts +| 01000 | vfwcvt.xu.f.v +| 01001 | vfwcvt.x.f.v +| 01010 | vfwcvt.f.xu.v +| 01011 | vfwcvt.f.x.v +| 01100 | vfwcvt.f.f.v +| 01110 | vfwcvt.rtz.xu.f.v +| 01111 | vfwcvt.rtz.x.f.v +| | +2+| narrowing converts +| 10000 | vfncvt.xu.f.w +| 10001 | vfncvt.x.f.w +| 10010 | vfncvt.f.xu.w +| 10011 | vfncvt.f.x.w +| 10100 | vfncvt.f.f.w +| 10101 | vfncvt.rod.f.f.w +| 10110 | vfncvt.rtz.xu.f.w +| 10111 | vfncvt.rtz.x.f.w +|=== + +.VFUNARY1 encoding space +[cols="2,14"] +|=== +| vs1 | name + +| 00000 | vfsqrt.v +| 00100 | vfrsqrt7.v +| 00101 | vfrec7.v +| 10000 | vfclass.v +|=== + + +.VMUNARY0 encoding space +[cols="2,14"] +|=== +| vs1 | + +| 00001 | vmsbf +| 00010 | vmsof +| 00011 | vmsif +| 10000 | viota +| 10001 | vid +|=== + + diff --git a/src/images/wavedrom/valu-format.adoc b/src/images/wavedrom/valu-format.adoc new file mode 100644 index 0000000..c6f6f52 --- /dev/null +++ b/src/images/wavedrom/valu-format.adoc @@ -0,0 +1,97 @@ +Formats for Vector Arithmetic Instructions under OP-V major opcode + +//// +31 26 25 24 20 19 15 14 12 11 7 6 0 + funct6 | vm | vs2 | vs1 | 0 0 0 | vd |1010111| OP-V (OPIVV) + funct6 | vm | vs2 | vs1 | 0 0 1 | vd/rd |1010111| OP-V (OPFVV) + funct6 | vm | vs2 | vs1 | 0 1 0 | vd/rd |1010111| OP-V (OPMVV) + funct6 | vm | vs2 | imm[4:0] | 0 1 1 | vd |1010111| OP-V (OPIVI) + funct6 | vm | vs2 | rs1 | 1 0 0 | vd |1010111| OP-V (OPIVX) + funct6 | vm | vs2 | rs1 | 1 0 1 | vd |1010111| OP-V (OPFVF) + funct6 | vm | vs2 | rs1 | 1 1 0 | vd/rd |1010111| OP-V (OPMVX) + 6 1 5 5 3 5 7 +//// + +```wavedrom +{reg: [ + {bits: 7, name: 0x57, attr: 'OPIVV'}, + {bits: 5, name: 'vd', type: 2}, + {bits: 3, name: 0}, + {bits: 5, name: 'vs1', type: 2}, + {bits: 5, name: 'vs2', type: 2}, + {bits: 1, name: 'vm'}, + {bits: 6, name: 'funct6'}, +]} +``` + +```wavedrom +{reg: [ + {bits: 7, name: 0x57, attr: 'OPFVV'}, + {bits: 5, name: 'vd / rd', type: 7}, + {bits: 3, name: 1}, + {bits: 5, name: 'vs1', type: 2}, + {bits: 5, name: 'vs2', type: 2}, + {bits: 1, name: 'vm'}, + {bits: 6, name: 'funct6'}, +]} +``` + +```wavedrom +{reg: [ + {bits: 7, name: 0x57, attr: 'OPMVV'}, + {bits: 5, name: 'vd / rd', type: 7}, + {bits: 3, name: 2}, + {bits: 5, name: 'vs1', type: 2}, + {bits: 5, name: 'vs2', type: 2}, + {bits: 1, name: 'vm'}, + {bits: 6, name: 'funct6'}, +]} +``` + +```wavedrom +{reg: [ + {bits: 7, name: 0x57, attr: ['OPIVI']}, + {bits: 5, name: 'vd', type: 2}, + {bits: 3, name: 3}, + {bits: 5, name: 'imm[4:0]', type: 5}, + {bits: 5, name: 'vs2', type: 2}, + {bits: 1, name: 'vm'}, + {bits: 6, name: 'funct6'}, +]} +``` + +```wavedrom +{reg: [ + {bits: 7, name: 0x57, attr: 'OPIVX'}, + {bits: 5, name: 'vd', type: 2}, + {bits: 3, name: 4}, + {bits: 5, name: 'rs1', type: 4}, + {bits: 5, name: 'vs2', type: 2}, + {bits: 1, name: 'vm'}, + {bits: 6, name: 'funct6'}, +]} +``` + +```wavedrom +{reg: [ + {bits: 7, name: 0x57, attr: 'OPFVF'}, + {bits: 5, name: 'vd', type: 2}, + {bits: 3, name: 5}, + {bits: 5, name: 'rs1', type: 4}, + {bits: 5, name: 'vs2', type: 2}, + {bits: 1, name: 'vm'}, + {bits: 6, name: 'funct6'}, +]} +``` + +```wavedrom +{reg: [ + {bits: 7, name: 0x57, attr: 'OPMVX'}, + {bits: 5, name: 'vd / rd', type: 7}, + {bits: 3, name: 6}, + {bits: 5, name: 'rs1', type: 4}, + {bits: 5, name: 'vs2', type: 2}, + {bits: 1, name: 'vm'}, + {bits: 6, name: 'funct6'}, +]} +``` diff --git a/src/images/wavedrom/vcfg-format.adoc b/src/images/wavedrom/vcfg-format.adoc new file mode 100644 index 0000000..f1bb4c0 --- /dev/null +++ b/src/images/wavedrom/vcfg-format.adoc @@ -0,0 +1,44 @@ +Formats for Vector Configuration Instructions under OP-V major opcode + +//// + 31 30 25 24 20 19 15 14 12 11 7 6 0 + 0 | zimm[10:0] | rs1 | 1 1 1 | rd |1010111| vsetvli + 1 | 1| zimm[ 9:0] | uimm[4:0]| 1 1 1 | rd |1010111| vsetivli + 1 | 000000 | rs2 | rs1 | 1 1 1 | rd |1010111| vsetvl + 1 6 5 5 3 5 7 +//// + +```wavedrom +{reg: [ + {bits: 7, name: 0x57, attr: 'vsetvli'}, + {bits: 5, name: 'rd', type: 4}, + {bits: 3, name: 7}, + {bits: 5, name: 'rs1', type: 4}, + {bits: 11, name: 'vtypei[10:0]', type: 5}, + {bits: 1, name: '0'}, +]} +``` + +```wavedrom +{reg: [ + {bits: 7, name: 0x57, attr: 'vsetivli'}, + {bits: 5, name: 'rd', type: 4}, + {bits: 3, name: 7}, + {bits: 5, name: 'uimm[4:0]', type: 5}, + {bits: 10, name: 'vtypei[9:0]', type: 5}, + {bits: 1, name: '1'}, + {bits: 1, name: '1'}, +]} +``` + +```wavedrom +{reg: [ + {bits: 7, name: 0x57, attr: 'vsetvl'}, + {bits: 5, name: 'rd', type: 4}, + {bits: 3, name: 7}, + {bits: 5, name: 'rs1', type: 4}, + {bits: 5, name: 'rs2', type: 4}, + {bits: 6, name: 0x00}, + {bits: 1, name: 1}, +]} +``` diff --git a/src/images/wavedrom/vfrec7.adoc b/src/images/wavedrom/vfrec7.adoc new file mode 100644 index 0000000..02abe60 --- /dev/null +++ b/src/images/wavedrom/vfrec7.adoc @@ -0,0 +1,136 @@ +.vfrec7.v common-case lookup table contents +[%autowidth] +|=== + +| sig[MSB -: 7] | sig_out[MSB -: 7] + +| 0 | 127 +| 1 | 125 +| 2 | 123 +| 3 | 121 +| 4 | 119 +| 5 | 117 +| 6 | 116 +| 7 | 114 +| 8 | 112 +| 9 | 110 +| 10 | 109 +| 11 | 107 +| 12 | 105 +| 13 | 104 +| 14 | 102 +| 15 | 100 +| 16 | 99 +| 17 | 97 +| 18 | 96 +| 19 | 94 +| 20 | 93 +| 21 | 91 +| 22 | 90 +| 23 | 88 +| 24 | 87 +| 25 | 85 +| 26 | 84 +| 27 | 83 +| 28 | 81 +| 29 | 80 +| 30 | 79 +| 31 | 77 +| 32 | 76 +| 33 | 75 +| 34 | 74 +| 35 | 72 +| 36 | 71 +| 37 | 70 +| 38 | 69 +| 39 | 68 +| 40 | 66 +| 41 | 65 +| 42 | 64 +| 43 | 63 +| 44 | 62 +| 45 | 61 +| 46 | 60 +| 47 | 59 +| 48 | 58 +| 49 | 57 +| 50 | 56 +| 51 | 55 +| 52 | 54 +| 53 | 53 +| 54 | 52 +| 55 | 51 +| 56 | 50 +| 57 | 49 +| 58 | 48 +| 59 | 47 +| 60 | 46 +| 61 | 45 +| 62 | 44 +| 63 | 43 +| 64 | 42 +| 65 | 41 +| 66 | 40 +| 67 | 40 +| 68 | 39 +| 69 | 38 +| 70 | 37 +| 71 | 36 +| 72 | 35 +| 73 | 35 +| 74 | 34 +| 75 | 33 +| 76 | 32 +| 77 | 31 +| 78 | 31 +| 79 | 30 +| 80 | 29 +| 81 | 28 +| 82 | 28 +| 83 | 27 +| 84 | 26 +| 85 | 25 +| 86 | 25 +| 87 | 24 +| 88 | 23 +| 89 | 23 +| 90 | 22 +| 91 | 21 +| 92 | 21 +| 93 | 20 +| 94 | 19 +| 95 | 19 +| 96 | 18 +| 97 | 17 +| 98 | 17 +| 99 | 16 +| 100 | 15 +| 101 | 15 +| 102 | 14 +| 103 | 14 +| 104 | 13 +| 105 | 12 +| 106 | 12 +| 107 | 11 +| 108 | 11 +| 109 | 10 +| 110 | 9 +| 111 | 9 +| 112 | 8 +| 113 | 8 +| 114 | 7 +| 115 | 7 +| 116 | 6 +| 117 | 5 +| 118 | 5 +| 119 | 4 +| 120 | 4 +| 121 | 3 +| 122 | 3 +| 123 | 2 +| 124 | 2 +| 125 | 1 +| 126 | 1 +| 127 | 0 + +|=== diff --git a/src/images/wavedrom/vfrsqrt7.adoc b/src/images/wavedrom/vfrsqrt7.adoc new file mode 100644 index 0000000..ace8022 --- /dev/null +++ b/src/images/wavedrom/vfrsqrt7.adoc @@ -0,0 +1,139 @@ +.vfrsqrt7.v common-case lookup table contents +[%autowidth] +|=== + +|exp[0] | sig[MSB -: 6] | sig_out[MSB -: 7] + +.64+|0 +| 0 | 52 +| 1 | 51 +| 2 | 50 +| 3 | 48 +| 4 | 47 +| 5 | 46 +| 6 | 44 +| 7 | 43 +| 8 | 42 +| 9 | 41 +| 10 | 40 +| 11 | 39 +| 12 | 38 +| 13 | 36 +| 14 | 35 +| 15 | 34 +| 16 | 33 +| 17 | 32 +| 18 | 31 +| 19 | 30 +| 20 | 30 +| 21 | 29 +| 22 | 28 +| 23 | 27 +| 24 | 26 +| 25 | 25 +| 26 | 24 +| 27 | 23 +| 28 | 23 +| 29 | 22 +| 30 | 21 +| 31 | 20 +| 32 | 19 +| 33 | 19 +| 34 | 18 +| 35 | 17 +| 36 | 16 +| 37 | 16 +| 38 | 15 +| 39 | 14 +| 40 | 14 +| 41 | 13 +| 42 | 12 +| 43 | 12 +| 44 | 11 +| 45 | 10 +| 46 | 10 +| 47 | 9 +| 48 | 9 +| 49 | 8 +| 50 | 7 +| 51 | 7 +| 52 | 6 +| 53 | 6 +| 54 | 5 +| 55 | 4 +| 56 | 4 +| 57 | 3 +| 58 | 3 +| 59 | 2 +| 60 | 2 +| 61 | 1 +| 62 | 1 +| 63 | 0 + +.64+|1 +| 0 | 127 +| 1 | 125 +| 2 | 123 +| 3 | 121 +| 4 | 119 +| 5 | 118 +| 6 | 116 +| 7 | 114 +| 8 | 113 +| 9 | 111 +| 10 | 109 +| 11 | 108 +| 12 | 106 +| 13 | 105 +| 14 | 103 +| 15 | 102 +| 16 | 100 +| 17 | 99 +| 18 | 97 +| 19 | 96 +| 20 | 95 +| 21 | 93 +| 22 | 92 +| 23 | 91 +| 24 | 90 +| 25 | 88 +| 26 | 87 +| 27 | 86 +| 28 | 85 +| 29 | 84 +| 30 | 83 +| 31 | 82 +| 32 | 80 +| 33 | 79 +| 34 | 78 +| 35 | 77 +| 36 | 76 +| 37 | 75 +| 38 | 74 +| 39 | 73 +| 40 | 72 +| 41 | 71 +| 42 | 70 +| 43 | 70 +| 44 | 69 +| 45 | 68 +| 46 | 67 +| 47 | 66 +| 48 | 65 +| 49 | 64 +| 50 | 63 +| 51 | 63 +| 52 | 62 +| 53 | 61 +| 54 | 60 +| 55 | 59 +| 56 | 59 +| 57 | 58 +| 58 | 57 +| 59 | 56 +| 60 | 56 +| 61 | 55 +| 62 | 54 +| 63 | 53 + +|=== diff --git a/src/images/wavedrom/vmem-format.adoc b/src/images/wavedrom/vmem-format.adoc new file mode 100644 index 0000000..3b20043 --- /dev/null +++ b/src/images/wavedrom/vmem-format.adoc @@ -0,0 +1,102 @@ +Format for Vector Load Instructions under LOAD-FP major opcode + +//// +31 29 28 27 26 25 24 20 19 15 14 12 11 7 6 0 + nf | mew| mop | vm | lumop | rs1 | width | vd |0000111| VL* unit-stride + nf | mew| mop | vm | rs2 | rs1 | width | vd |0000111| VLS* strided + nf | mew| mop | vm | vs2 | rs1 | width | vd |0000111| VLX* indexed + 3 1 2 1 5 5 3 5 7 +//// + +```wavedrom +{reg: [ + {bits: 7, name: 0x7, attr: 'VL* unit-stride'}, + {bits: 5, name: 'vd', attr: 'destination of load', type: 2}, + {bits: 3, name: 'width'}, + {bits: 5, name: 'rs1', attr: 'base address', type: 4}, + {bits: 5, name: 'lumop'}, + {bits: 1, name: 'vm'}, + {bits: 2, name: 'mop'}, + {bits: 1, name: 'mew'}, + {bits: 3, name: 'nf'}, +]} +``` + +```wavedrom +{reg: [ + {bits: 7, name: 0x7, attr: 'VLS* strided'}, + {bits: 5, name: 'vd', attr: 'destination of load', type: 2}, + {bits: 3, name: 'width'}, + {bits: 5, name: 'rs1', attr: 'base address', type: 4}, + {bits: 5, name: 'rs2', attr: 'stride', type: 4}, + {bits: 1, name: 'vm'}, + {bits: 2, name: 'mop'}, + {bits: 1, name: 'mew'}, + {bits: 3, name: 'nf'}, +]} +``` + +```wavedrom +{reg: [ + {bits: 7, name: 0x7, attr: 'VLX* indexed'}, + {bits: 5, name: 'vd', attr: 'destination of load', type: 2}, + {bits: 3, name: 'width'}, + {bits: 5, name: 'rs1', attr: 'base address', type: 4}, + {bits: 5, name: 'vs2', attr: 'address offsets', type: 2}, + {bits: 1, name: 'vm'}, + {bits: 2, name: 'mop'}, + {bits: 1, name: 'mew'}, + {bits: 3, name: 'nf'}, +]} +``` +Format for Vector Store Instructions under STORE-FP major opcode + +//// +31 29 28 27 26 25 24 20 19 15 14 12 11 7 6 0 + nf | mew| mop | vm | sumop | rs1 | width | vs3 |0100111| VS* unit-stride + nf | mew| mop | vm | rs2 | rs1 | width | vs3 |0100111| VSS* strided + nf | mew| mop | vm | vs2 | rs1 | width | vs3 |0100111| VSX* indexed + 3 1 2 1 5 5 3 5 7 +//// + +```wavedrom +{reg: [ + {bits: 7, name: 0x27, attr: 'VS* unit-stride'}, + {bits: 5, name: 'vs3', attr: 'store data', type: 2}, + {bits: 3, name: 'width'}, + {bits: 5, name: 'rs1', attr: 'base address', type: 4}, + {bits: 5, name: 'sumop'}, + {bits: 1, name: 'vm'}, + {bits: 2, name: 'mop'}, + {bits: 1, name: 'mew'}, + {bits: 3, name: 'nf'}, +]} +``` + +```wavedrom +{reg: [ + {bits: 7, name: 0x27, attr: 'VSS* strided'}, + {bits: 5, name: 'vs3', attr: 'store data', type: 2}, + {bits: 3, name: 'width'}, + {bits: 5, name: 'rs1', attr: 'base address', type: 4}, + {bits: 5, name: 'rs2', attr: 'stride', type: 4}, + {bits: 1, name: 'vm'}, + {bits: 2, name: 'mop'}, + {bits: 1, name: 'mew'}, + {bits: 3, name: 'nf'}, +]} +``` + +```wavedrom +{reg: [ + {bits: 7, name: 0x27, attr: 'VSX* indexed'}, + {bits: 5, name: 'vs3', attr: 'store data', type: 2}, + {bits: 3, name: 'width'}, + {bits: 5, name: 'rs1', attr: 'base address', type: 4}, + {bits: 5, name: 'vs2', attr: 'address offsets', type: 2}, + {bits: 1, name: 'vm'}, + {bits: 2, name: 'mop'}, + {bits: 1, name: 'mew'}, + {bits: 3, name: 'nf'}, +]} +``` diff --git a/src/images/wavedrom/vtype-format.adoc b/src/images/wavedrom/vtype-format.adoc new file mode 100644 index 0000000..a97af34 --- /dev/null +++ b/src/images/wavedrom/vtype-format.adoc @@ -0,0 +1,27 @@ +```wavedrom +{reg: [ + {bits: 3, name: 'vlmul[2:0]'}, + {bits: 3, name: 'vsew[2:0]'}, + {bits: 1, name: 'vta'}, + {bits: 1, name: 'vma'}, + {bits: 23, name: 'reserved'}, + {bits: 1, name: 'vill'}, +]} +``` + +NOTE: This diagram shows the layout for RV32 systems, whereas in +general `vill` should be at bit XLEN-1. + +.`vtype` register layout +[cols=">2,4,10"] +[%autowidth] +|=== +| Bits | Name | Description + +| XLEN-1 | vill | Illegal value if set +| XLEN-2:8 | 0 | Reserved if non-zero +| 7 | vma | Vector mask agnostic +| 6 | vta | Vector tail agnostic +| 5:3 | vsew[2:0] | Selected element width (SEW) setting +| 2:0 | vlmul[2:0] | Vector register group multiplier (LMUL) setting +|=== diff --git a/src/v-st-ext.adoc b/src/v-st-ext.adoc index e52bc59..619492b 100644 --- a/src/v-st-ext.adoc +++ b/src/v-st-ext.adoc @@ -165,7 +165,7 @@ The `vtype` register has five fields, `vill`, `vma`, `vta`, `vsew[2:0]`, and `vlmul[2:0]`. Bits `vtype[XLEN-2:8]` should be written with zero, and non-zero values in this field are reserved. -include::vtype-format.adoc[] +include::images/wavedrom/vtype-format.adoc[] NOTE: A small implementation supporting ELEN=32 requires only seven bits of state in `vtype`: two bits for `ma` and `ta`, two bits for @@ -887,11 +887,11 @@ floating-point load/store 12-bit immediate field to provide further vector instruction encoding, with bit 25 holding the standard vector mask bit (see <>). -include::vmem-format.adoc[] +include::images/wavedrom/vmem-format.adoc[] -include::valu-format.adoc[] +include::images/wavedrom/valu-format.adoc[] -include::vcfg-format.adoc[] +include::images/wavedrom/vcfg-format.adoc[] Vector instructions can have scalar or vector source operands and produce scalar or vector results, and most vector instructions can be @@ -1148,11 +1148,11 @@ their arguments, and write the new value of `vl` into `rd`. vsetvl rd, rs1, rs2 # rd = new vl, rs1 = AVL, rs2 = new vtype value ---- -include::vcfg-format.adoc[] +include::images/wavedrom/vcfg-format.adoc[] ==== `vtype` encoding -include::vtype-format.adoc[] +include::images/wavedrom/vtype-format.adoc[] The new `vtype` value is encoded in the immediate fields of `vsetvli` and `vsetivli`, and in the `rs2` register for `vsetvl`. @@ -1348,7 +1348,7 @@ floating-point load/store 12-bit immediate field to provide further vector instruction encoding, with bit 25 holding the standard vector mask bit (see <>). -include::vmem-format.adoc[] +include::images/wavedrom/vmem-format.adoc[] [cols="4,12"] |=== @@ -2173,7 +2173,7 @@ The vector arithmetic instructions use a new major opcode (OP-V = 1010111~2~) which neighbors OP-FP. The three-bit `funct3` field is used to define sub-categories of vector instructions. -include::valu-format.adoc[] +include::images/wavedrom/valu-format.adoc[] [[sec-arithmetic-encoding]] ==== Vector Arithmetic Instruction encoding @@ -3461,7 +3461,7 @@ The following table gives the seven MSBs of the output significand as a function of the LSB of the normalized input exponent and the six MSBs of the normalized input significand; the other bits of the output significand are zero. -include::vfrsqrt7.adoc[] +include::images/wavedrom/vfrsqrt7.adoc[] NOTE: For example, when SEW=32, vfrsqrt7(0x00718abc ({approx} 1.043e-38)) = 0x5f080000 ({approx} 9.800e18), and vfrsqrt7(0x7f765432 ({approx} 3.274e38)) = 0x1f820000 ({approx} 5.506e-20). @@ -3548,7 +3548,7 @@ The following table gives the seven MSBs of the normalized output significand as a function of the seven MSBs of the normalized input significand; the other bits of the normalized output significand are zero. -include::vfrec7.adoc[] +include::images/wavedrom/vfrec7.adoc[] If the normalized output exponent is 0 or -1, the result is subnormal: the output exponent is 0, and the output significand is given by concatenating @@ -5173,5 +5173,5 @@ computation in single-precision will suffice. === Vector Instruction Listing -include::inst-table.adoc[] +include::images/wavedrom/inst-table.adoc[] diff --git a/src/vector-examples.adoc b/src/vector-examples.adoc new file mode 100644 index 0000000..dade5a4 --- /dev/null +++ b/src/vector-examples.adoc @@ -0,0 +1,123 @@ +[appendix] +== Vector Assembly Code Examples + +The following are provided as non-normative text to help explain the vector ISA. + +=== Vector-vector add example + +---- +include::example/vvaddint32.s[lines=4..-1] +---- + +=== Example with mixed-width mask and compute. + +---- +# Code using one width for predicate and different width for masked +# compute. +# int8_t a[]; int32_t b[], c[]; +# for (i=0; i Date: Tue, 1 Aug 2023 14:17:43 -0400 Subject: Fixed wavedrom syntax Added square brackates and svg values. Added period character delimiters. --- src/images/wavedrom/valu-format.adoc | 35 +++++++++++++++++++++-------------- 1 file changed, 21 insertions(+), 14 deletions(-) diff --git a/src/images/wavedrom/valu-format.adoc b/src/images/wavedrom/valu-format.adoc index c6f6f52..cdd3447 100644 --- a/src/images/wavedrom/valu-format.adoc +++ b/src/images/wavedrom/valu-format.adoc @@ -12,7 +12,8 @@ Formats for Vector Arithmetic Instructions under OP-V major opcode 6 1 5 5 3 5 7 //// -```wavedrom +[wavedrom,,svg] +.... {reg: [ {bits: 7, name: 0x57, attr: 'OPIVV'}, {bits: 5, name: 'vd', type: 2}, @@ -22,9 +23,10 @@ Formats for Vector Arithmetic Instructions under OP-V major opcode {bits: 1, name: 'vm'}, {bits: 6, name: 'funct6'}, ]} -``` +.... -```wavedrom +[wavedrom,,svg] +.... {reg: [ {bits: 7, name: 0x57, attr: 'OPFVV'}, {bits: 5, name: 'vd / rd', type: 7}, @@ -34,9 +36,10 @@ Formats for Vector Arithmetic Instructions under OP-V major opcode {bits: 1, name: 'vm'}, {bits: 6, name: 'funct6'}, ]} -``` +.... -```wavedrom +[wavedrom,,svg] +.... {reg: [ {bits: 7, name: 0x57, attr: 'OPMVV'}, {bits: 5, name: 'vd / rd', type: 7}, @@ -46,9 +49,10 @@ Formats for Vector Arithmetic Instructions under OP-V major opcode {bits: 1, name: 'vm'}, {bits: 6, name: 'funct6'}, ]} -``` +.... -```wavedrom +[wavedrom,,svg] +.... {reg: [ {bits: 7, name: 0x57, attr: ['OPIVI']}, {bits: 5, name: 'vd', type: 2}, @@ -58,9 +62,10 @@ Formats for Vector Arithmetic Instructions under OP-V major opcode {bits: 1, name: 'vm'}, {bits: 6, name: 'funct6'}, ]} -``` +.... -```wavedrom +[wavedrom,,svg] +.... {reg: [ {bits: 7, name: 0x57, attr: 'OPIVX'}, {bits: 5, name: 'vd', type: 2}, @@ -70,9 +75,10 @@ Formats for Vector Arithmetic Instructions under OP-V major opcode {bits: 1, name: 'vm'}, {bits: 6, name: 'funct6'}, ]} -``` +.... -```wavedrom +[wavedrom,,svg] +.... {reg: [ {bits: 7, name: 0x57, attr: 'OPFVF'}, {bits: 5, name: 'vd', type: 2}, @@ -82,9 +88,10 @@ Formats for Vector Arithmetic Instructions under OP-V major opcode {bits: 1, name: 'vm'}, {bits: 6, name: 'funct6'}, ]} -``` +.... -```wavedrom +[wavedrom,,svg] +.... {reg: [ {bits: 7, name: 0x57, attr: 'OPMVX'}, {bits: 5, name: 'vd / rd', type: 7}, @@ -94,4 +101,4 @@ Formats for Vector Arithmetic Instructions under OP-V major opcode {bits: 1, name: 'vm'}, {bits: 6, name: 'funct6'}, ]} -``` +.... -- cgit v1.1 From 89f655e877c231424b78c355200ba77c102f20c6 Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Tue, 1 Aug 2023 14:19:47 -0400 Subject: Fixing wavedrom formatting. Added square brackets around wavedrom declaration. Added period delimiters. --- src/images/wavedrom/vtype-format.adoc | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/images/wavedrom/vtype-format.adoc b/src/images/wavedrom/vtype-format.adoc index a97af34..2f68a28 100644 --- a/src/images/wavedrom/vtype-format.adoc +++ b/src/images/wavedrom/vtype-format.adoc @@ -1,4 +1,5 @@ -```wavedrom +[wavedrom,,svg] +.... {reg: [ {bits: 3, name: 'vlmul[2:0]'}, {bits: 3, name: 'vsew[2:0]'}, @@ -7,7 +8,7 @@ {bits: 23, name: 'reserved'}, {bits: 1, name: 'vill'}, ]} -``` +.... NOTE: This diagram shows the layout for RV32 systems, whereas in general `vill` should be at bit XLEN-1. -- cgit v1.1 From f82b21024d3602f812a6ed250b08122aca856f20 Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Tue, 1 Aug 2023 14:26:44 -0400 Subject: Fixing wavedrom syntax for Vector. Added square brackets around wavedrom declarations. Added period delimiters. --- src/images/wavedrom/vcfg-format.adoc | 15 +++++++++------ src/images/wavedrom/vmem-format.adoc | 30 ++++++++++++++++++------------ 2 files changed, 27 insertions(+), 18 deletions(-) diff --git a/src/images/wavedrom/vcfg-format.adoc b/src/images/wavedrom/vcfg-format.adoc index f1bb4c0..ac0353c 100644 --- a/src/images/wavedrom/vcfg-format.adoc +++ b/src/images/wavedrom/vcfg-format.adoc @@ -8,7 +8,8 @@ Formats for Vector Configuration Instructions under OP-V major opcode 1 6 5 5 3 5 7 //// -```wavedrom +[wavedrom,,svg] +.... {reg: [ {bits: 7, name: 0x57, attr: 'vsetvli'}, {bits: 5, name: 'rd', type: 4}, @@ -17,9 +18,10 @@ Formats for Vector Configuration Instructions under OP-V major opcode {bits: 11, name: 'vtypei[10:0]', type: 5}, {bits: 1, name: '0'}, ]} -``` +.... -```wavedrom +[wavedrom,,svg] +.... {reg: [ {bits: 7, name: 0x57, attr: 'vsetivli'}, {bits: 5, name: 'rd', type: 4}, @@ -29,9 +31,10 @@ Formats for Vector Configuration Instructions under OP-V major opcode {bits: 1, name: '1'}, {bits: 1, name: '1'}, ]} -``` +.... -```wavedrom +[wavedrom,,svg] +.... {reg: [ {bits: 7, name: 0x57, attr: 'vsetvl'}, {bits: 5, name: 'rd', type: 4}, @@ -41,4 +44,4 @@ Formats for Vector Configuration Instructions under OP-V major opcode {bits: 6, name: 0x00}, {bits: 1, name: 1}, ]} -``` +.... diff --git a/src/images/wavedrom/vmem-format.adoc b/src/images/wavedrom/vmem-format.adoc index 3b20043..f9b25ee 100644 --- a/src/images/wavedrom/vmem-format.adoc +++ b/src/images/wavedrom/vmem-format.adoc @@ -8,7 +8,8 @@ Format for Vector Load Instructions under LOAD-FP major opcode 3 1 2 1 5 5 3 5 7 //// -```wavedrom +[wavedrom,,svg] +.... {reg: [ {bits: 7, name: 0x7, attr: 'VL* unit-stride'}, {bits: 5, name: 'vd', attr: 'destination of load', type: 2}, @@ -20,9 +21,10 @@ Format for Vector Load Instructions under LOAD-FP major opcode {bits: 1, name: 'mew'}, {bits: 3, name: 'nf'}, ]} -``` +.... -```wavedrom +[wavedrom,,svg] +.... {reg: [ {bits: 7, name: 0x7, attr: 'VLS* strided'}, {bits: 5, name: 'vd', attr: 'destination of load', type: 2}, @@ -34,9 +36,10 @@ Format for Vector Load Instructions under LOAD-FP major opcode {bits: 1, name: 'mew'}, {bits: 3, name: 'nf'}, ]} -``` +.... -```wavedrom +[wavedrom,,svg] +.... {reg: [ {bits: 7, name: 0x7, attr: 'VLX* indexed'}, {bits: 5, name: 'vd', attr: 'destination of load', type: 2}, @@ -48,7 +51,7 @@ Format for Vector Load Instructions under LOAD-FP major opcode {bits: 1, name: 'mew'}, {bits: 3, name: 'nf'}, ]} -``` +.... Format for Vector Store Instructions under STORE-FP major opcode //// @@ -59,7 +62,8 @@ Format for Vector Store Instructions under STORE-FP major opcode 3 1 2 1 5 5 3 5 7 //// -```wavedrom +[wavedrom,,svg] +.... {reg: [ {bits: 7, name: 0x27, attr: 'VS* unit-stride'}, {bits: 5, name: 'vs3', attr: 'store data', type: 2}, @@ -71,9 +75,10 @@ Format for Vector Store Instructions under STORE-FP major opcode {bits: 1, name: 'mew'}, {bits: 3, name: 'nf'}, ]} -``` +.... -```wavedrom +[wavedrom,,svg] +.... {reg: [ {bits: 7, name: 0x27, attr: 'VSS* strided'}, {bits: 5, name: 'vs3', attr: 'store data', type: 2}, @@ -85,9 +90,10 @@ Format for Vector Store Instructions under STORE-FP major opcode {bits: 1, name: 'mew'}, {bits: 3, name: 'nf'}, ]} -``` +.... -```wavedrom +[wavedrom,,svg] +.... {reg: [ {bits: 7, name: 0x27, attr: 'VSX* indexed'}, {bits: 5, name: 'vs3', attr: 'store data', type: 2}, @@ -99,4 +105,4 @@ Format for Vector Store Instructions under STORE-FP major opcode {bits: 1, name: 'mew'}, {bits: 3, name: 'nf'}, ]} -``` +.... -- cgit v1.1 From 4354619ee0b379b6a0c057c338b06eae040809a9 Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Tue, 1 Aug 2023 14:34:37 -0400 Subject: Added html references for math. Adding in references for math symbols. --- src/riscv-unprivileged.adoc | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/src/riscv-unprivileged.adoc b/src/riscv-unprivileged.adoc index 74c096a..aea1cc1 100644 --- a/src/riscv-unprivileged.adoc +++ b/src/riscv-unprivileged.adoc @@ -52,6 +52,11 @@ endif::[] :hide-uri-scheme: :stem: latexmath :footnote: +:le: ≤ +:ge: ≥ +:ne: ≠ +:approx: ≈ +:inf: ∞ _Contributors to all versions of the spec in alphabetical order (please contact editors to suggest corrections): Arvind, Krste Asanović, Rimas Avižienis, Jacob Bachmeyer, Christopher F. Batten, -- cgit v1.1 From 229599c55a103b811040553b63b05385fb84ad84 Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Tue, 1 Aug 2023 14:45:45 -0400 Subject: Add html refs for math symbols. Add html refs for math symbols. --- src/riscv-privileged.adoc | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/src/riscv-privileged.adoc b/src/riscv-privileged.adoc index 516fc3c..20e8947 100644 --- a/src/riscv-privileged.adoc +++ b/src/riscv-privileged.adoc @@ -52,6 +52,11 @@ endif::[] :hide-uri-scheme: :stem: latexmath :footnote: +:le: ≤ +:ge: ≥ +:ne: ≠ +:approx: ≈ +:inf: ∞ _Contributors to all versions of the spec in alphabetical order (please contact editors to suggest corrections): Krste Asanović, Peter Ashenden, Rimas -- cgit v1.1 From 7fa84e23e05b723efbbd3d8a4df0f0294d174931 Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Tue, 1 Aug 2023 14:52:05 -0400 Subject: Fixing up table formatting and page placement. Added flost and align center and header options to all tables. --- src/v-st-ext.adoc | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/src/v-st-ext.adoc b/src/v-st-ext.adoc index 619492b..b2ba25b 100644 --- a/src/v-st-ext.adoc +++ b/src/v-st-ext.adoc @@ -70,7 +70,7 @@ base scalar RISC-V ISA. .New vector CSRs [cols="2,2,2,10"] -[%autowidth] +[%autowidth,float="center",align="center",options="header"] |=== | Address | Privilege | Name | Description @@ -197,7 +197,7 @@ VLEN/SEW elements. .vsew[2:0] (selected element width) encoding [cols="1,1,1,1"] -[%autowidth] +[%autowidth,float="center",align="center",options="header"]] |=== 3+| vsew[2:0] | SEW @@ -214,7 +214,7 @@ formally _reserved_ at this point. .Example VLEN = 128 bits [cols=">,>"] -[%autowidth] +[%autowidth,float="center",align="center",options="header"]] |=== | SEW | Elements per vector register @@ -321,7 +321,7 @@ of elements that can be operated on with a single vector instruction given the current SEW and LMUL settings as shown in the table below. [cols="1,1,1,2,2,5,5"] -[%autowidth] +[%autowidth,float="center",align="center",options="header"]] |=== 3+| vlmul[2:0] | LMUL | #groups | VLMAX | Registers grouped with register __n__ @@ -363,7 +363,7 @@ operation, as defined in Section <>. All systems must support all four options: [cols="1,1,3,3"] -[%autowidth] +[%autowidth,float="center",align="center",options="header"]] |=== | `vta` | `vma` | Tail Elements | Inactive Elements @@ -594,7 +594,7 @@ mode as specified in the following table. .vxrm encoding [cols="1,1,4,10,5"] -[%autowidth] +[%autowidth,float="center",align="center",options="header"]] |=== 2+| `vxrm[1:0]` | Abbreviation | Rounding Mode | Rounding increment, `r` @@ -627,7 +627,7 @@ in the _XLEN_-bit-wide vector control and status CSR, `vcsr`. .vcsr layout [cols=">2,4,10"] -[%autowidth] +[%autowidth,float="center",align="center",options="header"]] |=== | Bits | Name | Description @@ -1218,7 +1218,7 @@ fields as follows: .AVL used in `vsetvli` and `vsetvl` instructions [cols="2,2,10,10"] -[%autowidth] +[%autowidth,float="center",align="center",options="header"]] |=== | `rd` | `rs1` | AVL value | Effect on `vl` | - | !x0 | Value in `x[rs1]` | Normal stripmining @@ -3420,7 +3420,7 @@ The following table describes the instruction's behavior for all classes of floating-point inputs: [cols="1,1,1"] -[%autowidth] +[%autowidth,float="center",align="center",options="header"]] |=== | Input | Output | Exceptions raised @@ -3489,7 +3489,7 @@ The following table describes the instruction's behavior for all classes of floating-point inputs, where _B_ is the exponent bias: [cols="1,1,1,1"] -[%autowidth] +[%autowidth,float="center",align="center",options="header"]] |=== | Input (_x_) | Rounding Mode | Output (_y_ {approx} _1/x_) | Exceptions raised @@ -4966,7 +4966,7 @@ advertise hardware capabilities. .Vector length extensions [cols="1,1"] -[%autowidth] +[%autowidth,float="center",align="center",options="header"]] |=== | Extension | Minimum VLEN @@ -5003,7 +5003,7 @@ supported. .Embedded vector extensions [cols="1,1,2,1,1"] -[%autowidth] +[%autowidth,float="center",align="center",options="header"]] |=== | Extension | Minimum VLEN | Supported EEW | FP32 | FP64 -- cgit v1.1 From f395f755902e66932089e07e9b57f90678bc399e Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Tue, 1 Aug 2023 14:57:28 -0400 Subject: Deleted extraneous outside square bracket. Deleted outside square bracket from Table formatting lines. --- src/v-st-ext.adoc | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/src/v-st-ext.adoc b/src/v-st-ext.adoc index b2ba25b..fab0a66 100644 --- a/src/v-st-ext.adoc +++ b/src/v-st-ext.adoc @@ -197,7 +197,7 @@ VLEN/SEW elements. .vsew[2:0] (selected element width) encoding [cols="1,1,1,1"] -[%autowidth,float="center",align="center",options="header"]] +[%autowidth,float="center",align="center",options="header"] |=== 3+| vsew[2:0] | SEW @@ -214,7 +214,7 @@ formally _reserved_ at this point. .Example VLEN = 128 bits [cols=">,>"] -[%autowidth,float="center",align="center",options="header"]] +[%autowidth,float="center",align="center",options="header"] |=== | SEW | Elements per vector register @@ -321,7 +321,7 @@ of elements that can be operated on with a single vector instruction given the current SEW and LMUL settings as shown in the table below. [cols="1,1,1,2,2,5,5"] -[%autowidth,float="center",align="center",options="header"]] +[%autowidth,float="center",align="center",options="header"] |=== 3+| vlmul[2:0] | LMUL | #groups | VLMAX | Registers grouped with register __n__ @@ -363,7 +363,7 @@ operation, as defined in Section <>. All systems must support all four options: [cols="1,1,3,3"] -[%autowidth,float="center",align="center",options="header"]] +[%autowidth,float="center",align="center",options="header"] |=== | `vta` | `vma` | Tail Elements | Inactive Elements @@ -594,7 +594,7 @@ mode as specified in the following table. .vxrm encoding [cols="1,1,4,10,5"] -[%autowidth,float="center",align="center",options="header"]] +[%autowidth,float="center",align="center",options="header"] |=== 2+| `vxrm[1:0]` | Abbreviation | Rounding Mode | Rounding increment, `r` @@ -627,7 +627,7 @@ in the _XLEN_-bit-wide vector control and status CSR, `vcsr`. .vcsr layout [cols=">2,4,10"] -[%autowidth,float="center",align="center",options="header"]] +[%autowidth,float="center",align="center",options="header"] |=== | Bits | Name | Description @@ -1218,7 +1218,7 @@ fields as follows: .AVL used in `vsetvli` and `vsetvl` instructions [cols="2,2,10,10"] -[%autowidth,float="center",align="center",options="header"]] +[%autowidth,float="center",align="center",options="header"] |=== | `rd` | `rs1` | AVL value | Effect on `vl` | - | !x0 | Value in `x[rs1]` | Normal stripmining @@ -3420,7 +3420,7 @@ The following table describes the instruction's behavior for all classes of floating-point inputs: [cols="1,1,1"] -[%autowidth,float="center",align="center",options="header"]] +[%autowidth,float="center",align="center",options="header"] |=== | Input | Output | Exceptions raised @@ -3489,7 +3489,7 @@ The following table describes the instruction's behavior for all classes of floating-point inputs, where _B_ is the exponent bias: [cols="1,1,1,1"] -[%autowidth,float="center",align="center",options="header"]] +[%autowidth,float="center",align="center",options="header"] |=== | Input (_x_) | Rounding Mode | Output (_y_ {approx} _1/x_) | Exceptions raised @@ -4966,7 +4966,7 @@ advertise hardware capabilities. .Vector length extensions [cols="1,1"] -[%autowidth,float="center",align="center",options="header"]] +[%autowidth,float="center",align="center",options="header"] |=== | Extension | Minimum VLEN @@ -5003,7 +5003,7 @@ supported. .Embedded vector extensions [cols="1,1,2,1,1"] -[%autowidth,float="center",align="center",options="header"]] +[%autowidth,float="center",align="center",options="header"] |=== | Extension | Minimum VLEN | Supported EEW | FP32 | FP64 -- cgit v1.1 From 460cd1e458f5db65dc002e8a4a8ea350de15b17d Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Tue, 1 Aug 2023 15:05:44 -0400 Subject: Adding table formatting to tables in wavedrom files. Adding table formatting to tables in wavedrom files. --- src/images/wavedrom/vfrec7.adoc | 2 +- src/images/wavedrom/vfrsqrt7.adoc | 2 +- src/images/wavedrom/vtype-format.adoc | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/src/images/wavedrom/vfrec7.adoc b/src/images/wavedrom/vfrec7.adoc index 02abe60..d33f44e 100644 --- a/src/images/wavedrom/vfrec7.adoc +++ b/src/images/wavedrom/vfrec7.adoc @@ -1,5 +1,5 @@ .vfrec7.v common-case lookup table contents -[%autowidth] +[%autowidth,float="center",align="center",options="header"] |=== | sig[MSB -: 7] | sig_out[MSB -: 7] diff --git a/src/images/wavedrom/vfrsqrt7.adoc b/src/images/wavedrom/vfrsqrt7.adoc index ace8022..befd881 100644 --- a/src/images/wavedrom/vfrsqrt7.adoc +++ b/src/images/wavedrom/vfrsqrt7.adoc @@ -1,5 +1,5 @@ .vfrsqrt7.v common-case lookup table contents -[%autowidth] +[%autowidth,float="center",align="center",options="header"] |=== |exp[0] | sig[MSB -: 6] | sig_out[MSB -: 7] diff --git a/src/images/wavedrom/vtype-format.adoc b/src/images/wavedrom/vtype-format.adoc index 2f68a28..9e6ab34 100644 --- a/src/images/wavedrom/vtype-format.adoc +++ b/src/images/wavedrom/vtype-format.adoc @@ -15,7 +15,7 @@ general `vill` should be at bit XLEN-1. .`vtype` register layout [cols=">2,4,10"] -[%autowidth] +[%autowidth,float="center",align="center",options="header"] |=== | Bits | Name | Description -- cgit v1.1 From f23fcd664f6e6c65ad88651c1f58ba3793856868 Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Tue, 1 Aug 2023 15:26:45 -0400 Subject: Fixing two inline wavedrom files. For some reason these two waavedroms are inline. Fixing them so they render correctly. --- src/v-st-ext.adoc | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/src/v-st-ext.adoc b/src/v-st-ext.adoc index fab0a66..cc3e8b1 100644 --- a/src/v-st-ext.adoc +++ b/src/v-st-ext.adoc @@ -1965,7 +1965,8 @@ Format for Vector Load Whole Register Instructions under LOAD-FP major opcode nf | mew| 00 | 1| 01000 | rs1 | width | vd |0000111| VLR //// -```wavedrom +[wavedrom,,svg] +.... {reg: [ {bits: 7, name: 0x07, attr: 'VL*R*'}, {bits: 5, name: 'vd', attr: 'destination of load', type: 2}, @@ -1977,7 +1978,7 @@ Format for Vector Load Whole Register Instructions under LOAD-FP major opcode {bits: 1, name: 'mew'}, {bits: 3, name: 'nf'}, ]} -``` +.... Format for Vector Store Whole Register Instructions under STORE-FP major opcode @@ -1986,7 +1987,8 @@ Format for Vector Store Whole Register Instructions under STORE-FP major opcode nf | 0 | 00 | 1| 01000 | rs1 | 000 | vs3 |0100111| VSR //// -```wavedrom +[wavedrom,,svg] +.... {reg: [ {bits: 7, name: 0x27, attr: 'VS*R*'}, {bits: 5, name: 'vs3', attr: 'store data', type: 2}, @@ -1998,7 +2000,7 @@ Format for Vector Store Whole Register Instructions under STORE-FP major opcode {bits: 1, name: 0x100, attr: 'mew'}, {bits: 3, name: 'nf'}, ]} -``` +.... These instructions load and store whole vector register groups. -- cgit v1.1 From 015fa413c28818eefe3cdf82d2fe0d7e61c74bf2 Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Wed, 2 Aug 2023 10:29:44 -0400 Subject: Trying to fix broken table. --- src/images/wavedrom/vfrsqrt7.adoc | 136 +------------------------------------- 1 file changed, 2 insertions(+), 134 deletions(-) diff --git a/src/images/wavedrom/vfrsqrt7.adoc b/src/images/wavedrom/vfrsqrt7.adoc index befd881..b3a4604 100644 --- a/src/images/wavedrom/vfrsqrt7.adoc +++ b/src/images/wavedrom/vfrsqrt7.adoc @@ -1,139 +1,7 @@ .vfrsqrt7.v common-case lookup table contents -[%autowidth,float="center",align="center",options="header"] +[%autowidth,cols="<,<,<",options="header"] |=== |exp[0] | sig[MSB -: 6] | sig_out[MSB -: 7] -.64+|0 -| 0 | 52 -| 1 | 51 -| 2 | 50 -| 3 | 48 -| 4 | 47 -| 5 | 46 -| 6 | 44 -| 7 | 43 -| 8 | 42 -| 9 | 41 -| 10 | 40 -| 11 | 39 -| 12 | 38 -| 13 | 36 -| 14 | 35 -| 15 | 34 -| 16 | 33 -| 17 | 32 -| 18 | 31 -| 19 | 30 -| 20 | 30 -| 21 | 29 -| 22 | 28 -| 23 | 27 -| 24 | 26 -| 25 | 25 -| 26 | 24 -| 27 | 23 -| 28 | 23 -| 29 | 22 -| 30 | 21 -| 31 | 20 -| 32 | 19 -| 33 | 19 -| 34 | 18 -| 35 | 17 -| 36 | 16 -| 37 | 16 -| 38 | 15 -| 39 | 14 -| 40 | 14 -| 41 | 13 -| 42 | 12 -| 43 | 12 -| 44 | 11 -| 45 | 10 -| 46 | 10 -| 47 | 9 -| 48 | 9 -| 49 | 8 -| 50 | 7 -| 51 | 7 -| 52 | 6 -| 53 | 6 -| 54 | 5 -| 55 | 4 -| 56 | 4 -| 57 | 3 -| 58 | 3 -| 59 | 2 -| 60 | 2 -| 61 | 1 -| 62 | 1 -| 63 | 0 - -.64+|1 -| 0 | 127 -| 1 | 125 -| 2 | 123 -| 3 | 121 -| 4 | 119 -| 5 | 118 -| 6 | 116 -| 7 | 114 -| 8 | 113 -| 9 | 111 -| 10 | 109 -| 11 | 108 -| 12 | 106 -| 13 | 105 -| 14 | 103 -| 15 | 102 -| 16 | 100 -| 17 | 99 -| 18 | 97 -| 19 | 96 -| 20 | 95 -| 21 | 93 -| 22 | 92 -| 23 | 91 -| 24 | 90 -| 25 | 88 -| 26 | 87 -| 27 | 86 -| 28 | 85 -| 29 | 84 -| 30 | 83 -| 31 | 82 -| 32 | 80 -| 33 | 79 -| 34 | 78 -| 35 | 77 -| 36 | 76 -| 37 | 75 -| 38 | 74 -| 39 | 73 -| 40 | 72 -| 41 | 71 -| 42 | 70 -| 43 | 70 -| 44 | 69 -| 45 | 68 -| 46 | 67 -| 47 | 66 -| 48 | 65 -| 49 | 64 -| 50 | 63 -| 51 | 63 -| 52 | 62 -| 53 | 61 -| 54 | 60 -| 55 | 59 -| 56 | 59 -| 57 | 58 -| 58 | 57 -| 59 | 56 -| 60 | 56 -| 61 | 55 -| 62 | 54 -| 63 | 53 - -|=== +|=== \ No newline at end of file -- cgit v1.1 From 2e341cdaf5a730cf49b963a21674f7dc3c9d0293 Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Thu, 3 Aug 2023 10:02:02 -0400 Subject: Trying to fix huge table. Playing with table formatting. --- src/images/wavedrom/vfrsqrt7.adoc | 133 +++++++++++++++++++++++++++++++++++++- 1 file changed, 132 insertions(+), 1 deletion(-) diff --git a/src/images/wavedrom/vfrsqrt7.adoc b/src/images/wavedrom/vfrsqrt7.adoc index b3a4604..10e2958 100644 --- a/src/images/wavedrom/vfrsqrt7.adoc +++ b/src/images/wavedrom/vfrsqrt7.adoc @@ -1,7 +1,138 @@ .vfrsqrt7.v common-case lookup table contents -[%autowidth,cols="<,<,<",options="header"] +[%autowidth,float=center,align=center,cols="<,<,<",options="header"] |=== |exp[0] | sig[MSB -: 6] | sig_out[MSB -: 7] +.64+|0| 0 | 52 +| 1 | 51 +| 2 | 50 +| 3 | 48 +| 4 | 47 +| 5 | 46 +| 6 | 44 +| 7 | 43 +| 8 | 42 +| 9 | 41 +| 10 | 40 +| 11 | 39 +| 12 | 38 +| 13 | 36 +| 14 | 35 +| 15 | 34 +| 16 | 33 +| 17 | 32 +| 18 | 31 +| 19 | 30 +| 20 | 30 +| 21 | 29 +| 22 | 28 +| 23 | 27 +| 24 | 26 +| 25 | 25 +| 26 | 24 +| 27 | 23 +| 28 | 23 +| 29 | 22 +| 30 | 21 +| 31 | 20 +| 32 | 19 +| 33 | 19 +| 34 | 18 +| 35 | 17 +| 36 | 16 +| 37 | 16 +| 38 | 15 +| 39 | 14 +| 40 | 14 +| 41 | 13 +| 42 | 12 +| 43 | 12 +| 44 | 11 +| 45 | 10 +| 46 | 10 +| 47 | 9 +| 48 | 9 +| 49 | 8 +| 50 | 7 +| 51 | 7 +| 52 | 6 +| 53 | 6 +| 54 | 5 +| 55 | 4 +| 56 | 4 +| 57 | 3 +| 58 | 3 +| 59 | 2 +| 60 | 2 +| 61 | 1 +| 62 | 1 +| 63 | 0 + +.64+|1 +| 0 | 127 +| 1 | 125 +| 2 | 123 +| 3 | 121 +| 4 | 119 +| 5 | 118 +| 6 | 116 +| 7 | 114 +| 8 | 113 +| 9 | 111 +| 10 | 109 +| 11 | 108 +| 12 | 106 +| 13 | 105 +| 14 | 103 +| 15 | 102 +| 16 | 100 +| 17 | 99 +| 18 | 97 +| 19 | 96 +| 20 | 95 +| 21 | 93 +| 22 | 92 +| 23 | 91 +| 24 | 90 +| 25 | 88 +| 26 | 87 +| 27 | 86 +| 28 | 85 +| 29 | 84 +| 30 | 83 +| 31 | 82 +| 32 | 80 +| 33 | 79 +| 34 | 78 +| 35 | 77 +| 36 | 76 +| 37 | 75 +| 38 | 74 +| 39 | 73 +| 40 | 72 +| 41 | 71 +| 42 | 70 +| 43 | 70 +| 44 | 69 +| 45 | 68 +| 46 | 67 +| 47 | 66 +| 48 | 65 +| 49 | 64 +| 50 | 63 +| 51 | 63 +| 52 | 62 +| 53 | 61 +| 54 | 60 +| 55 | 59 +| 56 | 59 +| 57 | 58 +| 58 | 57 +| 59 | 56 +| 60 | 56 +| 61 | 55 +| 62 | 54 +| 63 | 53 + |=== \ No newline at end of file -- cgit v1.1 From 2fbc9952a6cb5061adfa75499cbf46586fee82f8 Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Thu, 3 Aug 2023 10:32:23 -0400 Subject: Added overview to bitmanip chapter. Added overview.adoc from bitmanip repo to bitmanip chapter. --- src/b-st-ext.adoc | 463 ++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 448 insertions(+), 15 deletions(-) diff --git a/src/b-st-ext.adoc b/src/b-st-ext.adoc index 9240f6e..d4d8080 100644 --- a/src/b-st-ext.adoc +++ b/src/b-st-ext.adoc @@ -1,18 +1,451 @@ [[bits]] == "B" Standard Extension for Bit Manipulation, Version 0.0 -This chapter is a placeholder for a future standard extension to provide -bit manipulation instructions, including instructions to insert, -extract, and test bit fields, and for rotations, funnel shifts, and bit -and byte permutations. -[NOTE] -==== -Although bit manipulation instructions are very effective in some -application domains, particularly when dealing with externally packed -data structures, we excluded them from the base ISAs as they are not -useful in all domains and can add additional complexity or instruction -formats to supply all needed operands. - -We anticipate the B extension will be a brownfield encoding within the -base 30-bit instruction space. -==== +[[preface]] +=== Bit-manipulation a, b, c and s extensions grouped for public review and ratification + +The bit-manipulation (bitmanip) extension collection is comprised of several component extensions to the base RISC-V architecture that are intended to provide some combination of code size reduction, performance improvement, and energy reduction. +While the instructions are intended to have general use, some instructions are more useful in some domains than others. +Hence, several smaller bitmanip extensions are provided, rather than one large extension. +Each of these smaller extensions is grouped by common function and use case, and each has its own Zb*-extension name. + +Each bitmanip extension includes a group of several bitmanip instructions that have similar purposes and that can often share the same logic. Some instructions are available in only one extension while others are available in several. +The instructions have mnemonics and encodings that are independent of the extensions in which they appear. +Thus, when implementing extensions with overlapping instructions, there is no redundancy in logic or encoding. + +The bitmanip extensions are defined for RV32 and RV64. +Most of the instructions are expected to be forward compatible with RV128. +While the shift-immediate instructions are defined to have at most a 6-bit immediate field, a 7th bit is available in the encoding space should this be needed for RV128. + +=== Word Instructions + +The bitmanip extension follows the convention in RV64 that _w_-suffixed instructions (without a dot before the _w_) ignore the upper 32 bits of their inputs, operate on the least-significant 32-bits as signed values and produce a 32-bit signed result that is sign-extended to XLEN. + +Bitmanip instructions with the suffix _.uw_ have one operand that is an unsigned 32-bit value that is extracted from the least significant 32 bits of the specified register. Other than that, these perform full XLEN operations. + +Bitmanip instructions with the suffix _.b_, _.h_ and _.w_ only look at the least significant 8-bits, 16-bits and 32-bits of the input (respectively) and produce an XLEN-wide result that is sign-extended or zero-extended, based on the specific instruction. + +=== Pseudocode for instruction semantics + +The semantics of each instruction in <<#insns>> is expressed in a SAIL-like syntax. + +=== Extensions + +The first group of bitmanip extensions to be released for Public Review are: + +* <<#zba>> +* <<#zbb>> +* <<#zbc>> +* <<#zbs>> + +Below is a list of all of the instructions (and pseudoinstructions) that are included in these extensions +along with their specific mapping: + +[%header,cols="^3,^3,10,16,^2,^2,^2,^2"] +|==== +|RV32 +|RV64 +|Mnemonic +|Instruction +|Zba +|Zbb +|Zbc +|Zbs + +| +|✓ +|add.uw _rd_, _rs1_, _rs2_ +|<<#insns-add_uw>> +|✓ +| +| +| + +|✓ +|✓ +|andn _rd_, _rs1_, _rs2_ +|<<#insns-andn>> +| +|✓ +| +| + + +|✓ +|✓ +|clmul _rd_, _rs1_, _rs2_ +|<<#insns-clmul>> +| +| +|✓ +| + +|✓ +|✓ +|clmulh _rd_, _rs1_, _rs2_ +|<<#insns-clmulh>> +| +| +|✓ +| + +|✓ +|✓ +|clmulr _rd_, _rs1_, _rs2_ +|<<#insns-clmulr>> +| +| +|✓ +| + +|✓ +|✓ +|clz _rd_, _rs_ +|<<#insns-clz>> +| +|✓ +| +| + +| +|✓ +|clzw _rd_, _rs_ +|<<#insns-clzw>> +| +|✓ +| +| +|✓ +|✓ +|cpop _rd_, _rs_ +|<<#insns-cpop>> +| +|✓ +| +| + +| +|✓ +|cpopw _rd_, _rs_ +|<<#insns-cpopw>> +| +|✓ +| +| + +|✓ +|✓ +|ctz _rd_, _rs_ +|<<#insns-ctz>> +| +|✓ +| +| + +| +|✓ +|ctzw _rd_, _rs_ +|<<#insns-ctzw>> +| +|✓ +| +| + +|✓ +|✓ +|max _rd_, _rs1_, _rs2_ +|<<#insns-max>> +| +|✓ +| +| + +|✓ +|✓ +|maxu _rd_, _rs1_, _rs2_ +|<<#insns-maxu>> +| +|✓ +| +| + +|✓ +|✓ +|min _rd_, _rs1_, _rs2_ +|<<#insns-min>> +| +|✓ +| +| + +|✓ +|✓ +|minu _rd_, _rs1_, _rs2_ +|<<#insns-minu>> +| +|✓ +| +| + +|✓ +|✓ +|orc.b _rd_, _rs1_, _rs2_ +|<<#insns-orc_b>> +| +|✓ +| +| + +|✓ +|✓ +|orn _rd_, _rs1_, _rs2_ +|<<#insns-orn>> +| +|✓ +| +| + +|✓ +|✓ +|rev8 _rd_, _rs_ +|<<#insns-rev8>> +| +|✓ +| +| + +|✓ +|✓ +|rol _rd_, _rs1_, _rs2_ +|<<#insns-rol>> +| +|✓ +| +| + +| +|✓ +|rolw _rd_, _rs1_, _rs2_ +|<<#insns-rolw>> +| +|✓ +| +| + +|✓ +|✓ +|ror _rd_, _rs1_, _rs2_ +|<<#insns-ror>> +| +|✓ +| +| + +|✓ +|✓ +|rori _rd_, _rs1_, _shamt_ +|<<#insns-rori>> +| +|✓ +| +| + +| +|✓ +|roriw _rd_, _rs1_, _shamt_ +|<<#insns-roriw>> +| +|✓ +| +| + +| +|✓ +|rorw _rd_, _rs1_, _rs2_ +|<<#insns-rorw>> +| +|✓ +| +| + +|✓ +|✓ +|bclr _rd_, _rs1_, _rs2_ +|<<#insns-bclr>> +| +| +| +|✓ + +|✓ +|✓ +|bclri _rd_, _rs1_, _imm_ +|<<#insns-bclri>> +| +| +| +|✓ + +|✓ +|✓ +|bext _rd_, _rs1_, _rs2_ +|<<#insns-bext>> +| +| +| +|✓ + +|✓ +|✓ +|bexti _rd_, _rs1_, _imm_ +|<<#insns-bexti>> +| +| +| +|✓ + +|✓ +|✓ +|binv _rd_, _rs1_, _rs2_ +|<<#insns-binv>> +| +| +| +|✓ + +|✓ +|✓ +|binvi _rd_, _rs1_, _imm_ +|<<#insns-binvi>> +| +| +| +|✓ + +|✓ +|✓ +|bset _rd_, _rs1_, _rs2_ +|<<#insns-bset>> +| +| +| +|✓ + +|✓ +|✓ +|bseti _rd_, _rs1_, _imm_ +|<<#insns-bseti>> +| +| +| +|✓ + +|✓ +|✓ +|sext.b _rd_, _rs_ +|<<#insns-sext_b>> +| +|✓ +| +| + +|✓ +|✓ +|sext.h _rd_, _rs_ +|<<#insns-sext_h>> +| +|✓ +| +| + +|✓ +|✓ +|sh1add _rd_, _rs1_, _rs2_ +|<<#insns-sh1add>> +|✓ +| +| +| + +| +|✓ +|sh1add.uw _rd_, _rs1_, _rs2_ +|<<#insns-sh1add_uw>> +|✓ +| +| +| + +|✓ +|✓ +|sh2add _rd_, _rs1_, _rs2_ +|<<#insns-sh2add>> +|✓ +| +| +| + +| +|✓ +|sh2add.uw _rd_, _rs1_, _rs2_ +|<<#insns-sh2add_uw>> +|✓ +| +| +| + +|✓ +|✓ +|sh3add _rd_, _rs1_, _rs2_ +|<<#insns-sh3add>> +|✓ +| +| +| + +| +|✓ +|sh3add.uw _rd_, _rs1_, _rs2_ +|<<#insns-sh3add_uw>> +|✓ +| +| +| + +| +|✓ +|slli.uw _rd_, _rs1_, _imm_ +|<<#insns-slli_uw>> +|✓ +| +| +| + +|✓ +|✓ +|xnor _rd_, _rs1_, _rs2_ +|<<#insns-xnor>> +| +|✓ +| +| + +|✓ +|✓ +|zext.h _rd_, _rs_ +|<<#insns-zext_h>> +| +|✓ +| +| + +| +|✓ +|zext.w _rd_, _rs_ +|<<#insns-add_uw>> +|✓ +| +| +| + +|==== -- cgit v1.1 From e9deaa6e3a38f3bd37dde07a2c029ac0baa547fe Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Thu, 3 Aug 2023 10:38:36 -0400 Subject: Added zba to bitmanip chapter. Added zba.adoc to the bitmanip chapter. --- src/b-st-ext.adoc | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 73 insertions(+) diff --git a/src/b-st-ext.adoc b/src/b-st-ext.adoc index d4d8080..b692e25 100644 --- a/src/b-st-ext.adoc +++ b/src/b-st-ext.adoc @@ -449,3 +449,76 @@ along with their specific mapping: | |==== + +[#zba,reftext=Address generation instructions] +==== Zba extension + +[NOTE,caption=Frozen] +==== +The Zba extension is frozen. +==== + +The Zba instructions can be used to accelerate the generation of addresses that index into arrays of basic types (halfword, word, doubleword) using both unsigned word-sized and XLEN-sized indices: a shifted index is added to a base address. + +The shift and add instructions do a left shift of 1, 2, or 3 because these are commonly found in real-world code and because they can be implemented with a minimal amount of additional hardware beyond that of the simple adder. This avoids lengthening the critical path in implementations. + +While the shift and add instructions are limited to a maximum left shift of 3, the slli instruction (from the base ISA) can be used to perform similar shifts for indexing into arrays of wider elements. The slli.uw -- added in this extension -- can be used when the index is to be interpreted as an unsigned word. + +The following instructions (and pseudoinstructions) comprise the Zba extension: + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +| +|✓ +|add.uw _rd_, _rs1_, _rs2_ +|<<#insns-add_uw>> + +|✓ +|✓ +|sh1add _rd_, _rs1_, _rs2_ +|<<#insns-sh1add>> + +| +|✓ +|sh1add.uw _rd_, _rs1_, _rs2_ +|<<#insns-sh1add_uw>> + +|✓ +|✓ +|sh2add _rd_, _rs1_, _rs2_ +|<<#insns-sh2add>> + +| +|✓ +|sh2add.uw _rd_, _rs1_, _rs2_ +|<<#insns-sh2add_uw>> + +|✓ +|✓ +|sh3add _rd_, _rs1_, _rs2_ +|<<#insns-sh3add>> + +| +|✓ +|sh3add.uw _rd_, _rs1_, _rs2_ +|<<#insns-sh3add_uw>> + +| +|✓ +|slli.uw _rd_, _rs1_, _imm_ +|<<#insns-slli_uw>> + +| +|✓ +|zext.w _rd_, _rs_ +|<<#insns-add_uw>> + +|=== + + + -- cgit v1.1 From cd539c396dc3571c38cc28b0e1c195ec39f95047 Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Thu, 3 Aug 2023 10:45:22 -0400 Subject: Added in zbb to bitmanip chapter Added zbb.adoc to bitmanip chapter. --- src/b-st-ext.adoc | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/src/b-st-ext.adoc b/src/b-st-ext.adoc index b692e25..a4d03c1 100644 --- a/src/b-st-ext.adoc +++ b/src/b-st-ext.adoc @@ -520,5 +520,11 @@ The following instructions (and pseudoinstructions) comprise the Zba extension: |=== +[#zbb,reftext="Basic bit-manipulation"] +==== Zbb: Basic bit-manipulation +[NOTE,caption=Frozen] +==== +The Zbb extension is frozen. +==== -- cgit v1.1 From da9445f12b64bd9371f9db6f0d69d51649e1ea15 Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Thu, 3 Aug 2023 10:51:09 -0400 Subject: Added logical with negate to bitmanip chapter. Added _logical-with-negate.adoc to bitmanip chapter. --- src/b-st-ext.adoc | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/src/b-st-ext.adoc b/src/b-st-ext.adoc index a4d03c1..7bc9a34 100644 --- a/src/b-st-ext.adoc +++ b/src/b-st-ext.adoc @@ -527,4 +527,35 @@ The following instructions (and pseudoinstructions) comprise the Zba extension: ==== The Zbb extension is frozen. ==== +===== Logical with negate + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|✓ +|✓ +|andn _rd_, _rs1_, _rs2_ +|<<#insns-andn>> + +|✓ +|✓ +|orn _rd_, _rs1_, _rs2_ +|<<#insns-orn>> + +|✓ +|✓ +|xnor _rd_, _rs1_, _rs2_ +|<<#insns-xnor>> +|=== + +.Implementation Hint +[NOTE, caption="Imp" ] +=============================================================== +The Logical with Negate instructions can be implemented by inverting the _rs2_ inputs to the base-required AND, OR, and XOR logic instructions. +In some implementations, the inverter on rs2 used for subtraction can be reused for this purpose. +=============================================================== -- cgit v1.1 From 055502525d5ae60304a21836d66433f8874fa00b Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Thu, 3 Aug 2023 10:56:26 -0400 Subject: Adding count bits to bitmanip Adding _count_bits.adoc into bitmanip chapter. --- src/b-st-ext.adoc | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/src/b-st-ext.adoc b/src/b-st-ext.adoc index 7bc9a34..c08f326 100644 --- a/src/b-st-ext.adoc +++ b/src/b-st-ext.adoc @@ -559,3 +559,33 @@ The Logical with Negate instructions can be implemented by inverting the _rs2_ i In some implementations, the inverter on rs2 used for subtraction can be reused for this purpose. =============================================================== +===== Count leading/trailing zero bits + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|✓ +|✓ +|clz _rd_, _rs_ +|<<#insns-clz>> + +| +|✓ +|clzw _rd_, _rs_ +|<<#insns-clzw>> + +|✓ +|✓ +|ctz _rd_, _rs_ +|<<#insns-ctz>> + +| +|✓ +|ctzw _rd_, _rs_ +|<<#insns-ctzw>> +|=== + -- cgit v1.1 From 2480df321bafdcc79629b064530e0d60afc0ad2e Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Thu, 3 Aug 2023 10:58:09 -0400 Subject: Added popcount to bitmanip chapter. Adding _popcount.adoc to bitmanip chapter. --- src/b-st-ext.adoc | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/src/b-st-ext.adoc b/src/b-st-ext.adoc index c08f326..d325149 100644 --- a/src/b-st-ext.adoc +++ b/src/b-st-ext.adoc @@ -589,3 +589,27 @@ In some implementations, the inverter on rs2 used for subtraction can be reused |<<#insns-ctzw>> |=== +===== Count population + +These instructions count the number of set bits (1-bits). This is also +commonly referred to as population count. + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|✓ +|✓ +|cpop _rd_, _rs_ +|<<#insns-cpop>> + +| +|✓ +|cpopw _rd_, _rs_ +|<<#insns-cpopw>> +|=== + + -- cgit v1.1 From aa537e0c4e800e8cd3f87fdb7d75e90bc328b957 Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Thu, 3 Aug 2023 10:59:22 -0400 Subject: Added max min to bitmanip chapter. Adding _max_min.adoc into bitmanip chapter. --- src/b-st-ext.adoc | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/src/b-st-ext.adoc b/src/b-st-ext.adoc index d325149..9e0dee7 100644 --- a/src/b-st-ext.adoc +++ b/src/b-st-ext.adoc @@ -612,4 +612,37 @@ commonly referred to as population count. |<<#insns-cpopw>> |=== +===== Integer minimum/maximum + +The integer minimum/maximum instructions are arithmetic R-type +instructions that return the smaller/larger of two operands. + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|✓ +|✓ +|max _rd_, _rs1_, _rs2_ +|<<#insns-max>> + +|✓ +|✓ +|maxu _rd_, _rs1_, _rs2_ +|<<#insns-maxu>> + +|✓ +|✓ +|min _rd_, _rs1_, _rs2_ +|<<#insns-min>> + +|✓ +|✓ +|minu _rd_, _rs1_, _rs2_ +|<<#insns-minu>> +|=== + -- cgit v1.1 From be406c1e0dade9ffc1f93942cf7f65c9c2af844a Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Thu, 3 Aug 2023 11:00:54 -0400 Subject: Added sign extend to bitmanip chapter. Adding _signextend.adoc to bitmanip chapter. --- src/b-st-ext.adoc | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/src/b-st-ext.adoc b/src/b-st-ext.adoc index 9e0dee7..ce5da5b 100644 --- a/src/b-st-ext.adoc +++ b/src/b-st-ext.adoc @@ -645,4 +645,33 @@ instructions that return the smaller/larger of two operands. |<<#insns-minu>> |=== +===== Sign- and zero-extension + +These instructions perform the sign-extension or zero-extension of the least significant 8 bits, 16 bits or 32 bits of the source register. + +These instructions replace the generalized idioms `slli rD,rS,(XLEN-) + srli` (for zero-extension) or `slli + srai` (for sign-extension) for the sign-extension of 8-bit and 16-bit quantities, and for the zero-extension of 16-bit and 32-bit quantities. + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|✓ +|✓ +|sext.b _rd_, _rs_ +|<<#insns-sext_b>> + +|✓ +|✓ +|sext.h _rd_, _rs_ +|<<#insns-sext_h>> + +|✓ +|✓ +|zext.h _rd_, _rs_ +|<<#insns-zext_h>> +|=== + -- cgit v1.1 From 189980465013027ee4441ccacacf6551f9e11bf6 Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Thu, 3 Aug 2023 11:02:12 -0400 Subject: Added bitwise rotation to bitmanip chapter. Adding _rotate.adoc to bitmanip chapter. --- src/b-st-ext.adoc | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) diff --git a/src/b-st-ext.adoc b/src/b-st-ext.adoc index ce5da5b..3d1a04a 100644 --- a/src/b-st-ext.adoc +++ b/src/b-st-ext.adoc @@ -674,4 +674,56 @@ These instructions replace the generalized idioms `slli rD,rS,(XLEN-) + sr |<<#insns-zext_h>> |=== +===== Bitwise rotation + +Bitwise rotation instructions are similar to the shift-logical operations from the base spec. However, where the shift-logical +instructions shift in zeros, the rotate instructions shift in the bits that were shifted out of the other side of the value. +Such operations are also referred to as ‘circular shifts’. + + + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|✓ +|✓ +|rol _rd_, _rs1_, _rs2_ +|<<#insns-rol>> + +| +|✓ +|rolw _rd_, _rs1_, _rs2_ +|<<#insns-rolw>> + +|✓ +|✓ +|ror _rd_, _rs1_, _rs2_ +|<<#insns-ror>> + +|✓ +|✓ +|rori _rd_, _rs1_, _shamt_ +|<<#insns-rori>> + +| +|✓ +|roriw _rd_, _rs1_, _shamt_ +|<<#insns-roriw>> + +| +|✓ +|rorw _rd_, _rs1_, _rs2_ +|<<#insns-rorw>> +|=== + +.Architecture Explanation +[NOTE, caption="AE" ] +=============================================================== +The rotate instructions were included to replace a common +four-instruction sequence to achieve the same effect (neg; sll/srl; srl/sll; or) +=============================================================== -- cgit v1.1 From 690902d1f3fd22f273255e1c3dad2d8c7dfaface Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Thu, 3 Aug 2023 11:04:48 -0400 Subject: Adding OR combine to bitmanip chapter. Adding _OR_combine.adoc to bitmanip chapter. --- src/b-st-ext.adoc | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/src/b-st-ext.adoc b/src/b-st-ext.adoc index 3d1a04a..99df2b9 100644 --- a/src/b-st-ext.adoc +++ b/src/b-st-ext.adoc @@ -727,3 +727,21 @@ The rotate instructions were included to replace a common four-instruction sequence to achieve the same effect (neg; sll/srl; srl/sll; or) =============================================================== +===== OR Combine + +*orc.b* sets the bits of each byte in the result _rd_ to all zeros if no bit within the respective byte of _rs_ is set, or to all ones if any bit within the respective byte of _rs_ is set. + +One use-case is string-processing functions, such as *strlen* and *strcpy*, which can use *orc.b* to test for the terminating zero byte by counting the set bits in leading non-zero bytes in a word. + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|✓ +|✓ +|orc.b _rd_, _rs_ +|<<#insns-orc_b>> +|=== -- cgit v1.1 From aa4327f7215607b31415a2b6736af100ef639152 Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Thu, 3 Aug 2023 11:06:17 -0400 Subject: Added Byte reverse to bitmanip chapter. Adding _byteswap.adoc to bitmanip chapter. --- src/b-st-ext.adoc | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/src/b-st-ext.adoc b/src/b-st-ext.adoc index 99df2b9..ac0e91c 100644 --- a/src/b-st-ext.adoc +++ b/src/b-st-ext.adoc @@ -745,3 +745,22 @@ One use-case is string-processing functions, such as *strlen* and *strcpy*, whic |orc.b _rd_, _rs_ |<<#insns-orc_b>> |=== + +===== Byte-reverse + +*rev8* reverses the byte-ordering of _rs_. + +[%header,cols="^1,^1,4,8"] +|==== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|✓ +|✓ +|rev8 _rd_, _rs_ +|<<#insns-rev8>> + +|==== + -- cgit v1.1 From 83c2e344eb4d55954a651db81220f12bbeaba084 Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Thu, 3 Aug 2023 11:14:47 -0400 Subject: Adding zbc to bitmanip chapter. Adding zbc.adoc to bitmanip chapter. --- src/b-st-ext.adoc | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) diff --git a/src/b-st-ext.adoc b/src/b-st-ext.adoc index ac0e91c..9273008 100644 --- a/src/b-st-ext.adoc +++ b/src/b-st-ext.adoc @@ -764,3 +764,40 @@ One use-case is string-processing functions, such as *strlen* and *strcpy*, whic |==== +[#zbc,reftext="Carry-less multiplication"] +==== Zbc: Carry-less multiplication + +[NOTE,caption=Frozen] +==== +The Zbc extension is frozen. +==== + +Carry-less multiplication is the multiplication in the polynomial ring over GF(2). + +*clmul* produces the lower half of the carry-less product and *clmulh* produces the upper half of the 2✕XLEN carry-less product. + +*clmulr* produces bits 2✕XLEN−2:XLEN-1 of the 2✕XLEN carry-less product. + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|✓ +|✓ +|clmul _rd_, _rs1_, _rs2_ +|<<#insns-clmul>> + +|✓ +|✓ +|clmulh _rd_, _rs1_, _rs2_ +|<<#insns-clmulh>> + +|✓ +|✓ +|clmulr _rd_, _rs1_, _rs2_ +|<<#insns-clmulr>> + +|=== -- cgit v1.1 From 7c5def0f712a4fa0591d24faee8bcc125decdf37 Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Thu, 3 Aug 2023 11:16:16 -0400 Subject: Adding zbs to bitmanip chapter Adding zbs.adoc to bitmanip chapter. --- src/b-st-ext.adoc | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+) diff --git a/src/b-st-ext.adoc b/src/b-st-ext.adoc index 9273008..f967a3f 100644 --- a/src/b-st-ext.adoc +++ b/src/b-st-ext.adoc @@ -801,3 +801,63 @@ Carry-less multiplication is the multiplication in the polynomial ring over GF(2 |<<#insns-clmulr>> |=== + +[#zbs,reftext="Single-bit instructions"] +==== Zbs: Single-bit instructions + +[NOTE,caption=Frozen] +==== +The Zbs extension is frozen. +==== + +The single-bit instructions provide a mechanism to set, clear, invert, or extract +a single bit in a register. The bit is specified by its index. + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|✓ +|✓ +|bclr _rd_, _rs1_, _rs2_ +|<<#insns-bclr>> + +|✓ +|✓ +|bclri _rd_, _rs1_, _imm_ +|<<#insns-bclri>> + +|✓ +|✓ +|bext _rd_, _rs1_, _rs2_ +|<<#insns-bext>> + +|✓ +|✓ +|bexti _rd_, _rs1_, _imm_ +|<<#insns-bexti>> + +|✓ +|✓ +|binv _rd_, _rs1_, _rs2_ +|<<#insns-binv>> + +|✓ +|✓ +|binvi _rd_, _rs1_, _imm_ +|<<#insns-binvi>> + +|✓ +|✓ +|bset _rd_, _rs1_, _rs2_ +|<<#insns-bset>> + +|✓ +|✓ +|bseti _rd_, _rs1_, _imm_ +|<<#insns-bseti>> + +|=== -- cgit v1.1 From 782de747deba84d1099e917ed3946997c3dc3de4 Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Thu, 3 Aug 2023 11:17:34 -0400 Subject: Adding zbkc to bitmanip. Adding zbkc.adoc to bitmanip chapter. --- src/b-st-ext.adoc | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/src/b-st-ext.adoc b/src/b-st-ext.adoc index f967a3f..21eb837 100644 --- a/src/b-st-ext.adoc +++ b/src/b-st-ext.adoc @@ -861,3 +861,37 @@ a single bit in a register. The bit is specified by its index. |<<#insns-bseti>> |=== + +[#zbkc,reftext="Carry-less multiplication for Cryptography"] +==== Zbkc: Carry-less multiplication for Cryptography + +[NOTE,caption=Frozen] +==== +The Zbkc extension is frozen. +==== + +Carry-less multiplication is the multiplication in the polynomial ring over +GF(2). This is a critical operation in some cryptographic workloads, +particularly the AES-GCM authenticated encryption scheme. +This extension provides only the instructions needed to +efficiently implement the GHASH operation, which is part of this workload. + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|✓ +|✓ +|clmul _rd_, _rs1_, _rs2_ +|<<#insns-clmul>> + +|✓ +|✓ +|clmulh _rd_, _rs1_, _rs2_ +|<<#insns-clmulh>> + +|=== + -- cgit v1.1 From 1280aea0533c24b112699098a8d6fbdfbc285ab4 Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Thu, 3 Aug 2023 11:18:46 -0400 Subject: Adding zbkx to bitmanip. Adding zbkx.adoc to bitmanip chapter. --- src/b-st-ext.adoc | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/src/b-st-ext.adoc b/src/b-st-ext.adoc index 21eb837..7f90915 100644 --- a/src/b-st-ext.adoc +++ b/src/b-st-ext.adoc @@ -895,3 +895,42 @@ efficiently implement the GHASH operation, which is part of this workload. |=== +[#zbkx,reftext="Crossbar permutations"] +==== Zbkx: Crossbar permutations + +[NOTE,caption=Frozen] +==== +The Zbkx extension is frozen. +==== + +These instructions implement a "lookup table" for 4 and 8 bit elements +inside the general purpose registers. +_rs1_ is used as a vector of N-bit words, and _rs2_ as a vector of N-bit +indices into _rs1_. +Elements in _rs1_ are replaced by the indexed element in _rs2_, or zero +if the index into _rs2_ is out of bounds. + +These instructions are useful for expressing N-bit to N-bit boolean +operations, and implementing cryptographic code with secret +dependent memory accesses (particularly SBoxes) such that the execution +latency does not depend on the (secret) data being operated on. + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|✓ +|✓ +|xperm.n _rd_, _rs1_, _rs2_ +|<<#insns-xpermn>> + +|✓ +|✓ +|xperm.b _rd_, _rs1_, _rs2_ +|<<#insns-xpermb>> + +|=== + -- cgit v1.1 From ef0efbe433efa967b3c225b72408c713f7530b09 Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Thu, 3 Aug 2023 11:20:12 -0400 Subject: Adding zbkb to bitmanip Adding zbkb.adoc to bitmanip chapter. --- src/b-st-ext.adoc | 100 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 100 insertions(+) diff --git a/src/b-st-ext.adoc b/src/b-st-ext.adoc index 7f90915..194afe8 100644 --- a/src/b-st-ext.adoc +++ b/src/b-st-ext.adoc @@ -934,3 +934,103 @@ latency does not depend on the (secret) data being operated on. |=== +[#zbkb,reftext="Bit-manipulation for Cryptography"] +==== Zbkb: Bit-manipulation for Cryptography + +[NOTE,caption=Frozen] +==== +The Zbkb extension is frozen. +==== + +This extension contains instructions essential for implementing +common operations in cryptographic workloads. + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + + +| ✓ +| ✓ +| rol +| <> + +| +| ✓ +| rolw +| <> + +| ✓ +| ✓ +| ror +| <> + +| ✓ +| ✓ +| rori +| <> + +| +| ✓ +| roriw +| <> + +| +| ✓ +| rorw +| <> + +| ✓ +| ✓ +| andn +| <> + +| ✓ +| ✓ +| orn +| <> + +| ✓ +| ✓ +| xnor +| <> + +| ✓ +| ✓ +| pack +| <> + +| ✓ +| ✓ +| packh +| <> + +| +| ✓ +| packw +| <> + +| ✓ +| ✓ +| rev.b +| <> + +| ✓ +| ✓ +| rev8 +| <> + +| ✓ +| +| zip +| <> + +| ✓ +| +| unzip +| <> + +|=== -- cgit v1.1 From efe57a03b06f394ea78460674f60cd39f8517ba9 Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Thu, 3 Aug 2023 11:49:50 -0400 Subject: Added all instructions to bitmanip chapter. Added all instructions in alphabetical order to bitmanip chapter. --- src/b-st-ext.adoc | 104 +++++++++++++++++++++++++++++++++++++++++++++++ src/insns/add_uw.adoc | 50 +++++++++++++++++++++++ src/insns/andn.adoc | 47 +++++++++++++++++++++ src/insns/bclr.adoc | 45 ++++++++++++++++++++ src/insns/bclri.adoc | 59 +++++++++++++++++++++++++++ src/insns/bext.adoc | 46 +++++++++++++++++++++ src/insns/bexti.adoc | 59 +++++++++++++++++++++++++++ src/insns/binv.adoc | 45 ++++++++++++++++++++ src/insns/binvi.adoc | 59 +++++++++++++++++++++++++++ src/insns/bset.adoc | 44 ++++++++++++++++++++ src/insns/bseti.adoc | 59 +++++++++++++++++++++++++++ src/insns/clmul.adoc | 57 ++++++++++++++++++++++++++ src/insns/clmulh.adoc | 57 ++++++++++++++++++++++++++ src/insns/clmulr.adoc | 63 ++++++++++++++++++++++++++++ src/insns/clz.adoc | 52 ++++++++++++++++++++++++ src/insns/clzw.adoc | 53 ++++++++++++++++++++++++ src/insns/cpop.adoc | 56 +++++++++++++++++++++++++ src/insns/cpopw.adoc | 49 ++++++++++++++++++++++ src/insns/ctz.adoc | 53 ++++++++++++++++++++++++ src/insns/ctzw.adoc | 52 ++++++++++++++++++++++++ src/insns/max.adoc | 60 +++++++++++++++++++++++++++ src/insns/maxu.adoc | 50 +++++++++++++++++++++++ src/insns/min.adoc | 50 +++++++++++++++++++++++ src/insns/minu.adoc | 50 +++++++++++++++++++++++ src/insns/orc_b.adoc | 51 +++++++++++++++++++++++ src/insns/orn.adoc | 47 +++++++++++++++++++++ src/insns/pack.adoc | 46 +++++++++++++++++++++ src/insns/packh.adoc | 47 +++++++++++++++++++++ src/insns/packw.adoc | 49 ++++++++++++++++++++++ src/insns/rev8.adoc | 82 +++++++++++++++++++++++++++++++++++++ src/insns/revb.adoc | 46 +++++++++++++++++++++ src/insns/rol.adoc | 52 ++++++++++++++++++++++++ src/insns/rolw.adoc | 51 +++++++++++++++++++++++ src/insns/ror.adoc | 52 ++++++++++++++++++++++++ src/insns/rori.adoc | 66 ++++++++++++++++++++++++++++++ src/insns/roriw.adoc | 54 ++++++++++++++++++++++++ src/insns/rorw.adoc | 51 +++++++++++++++++++++++ src/insns/sext_b.adoc | 43 ++++++++++++++++++++ src/insns/sext_h.adoc | 43 ++++++++++++++++++++ src/insns/sh1add.adoc | 46 +++++++++++++++++++++ src/insns/sh1add_uw.adoc | 46 +++++++++++++++++++++ src/insns/sh2add.adoc | 43 ++++++++++++++++++++ src/insns/sh2add_uw.adoc | 48 ++++++++++++++++++++++ src/insns/sh3add.adoc | 42 +++++++++++++++++++ src/insns/sh3add_uw.adoc | 45 ++++++++++++++++++++ src/insns/slli_uw.adoc | 51 +++++++++++++++++++++++ src/insns/unzip.adoc | 60 +++++++++++++++++++++++++++ src/insns/xnor.adoc | 47 +++++++++++++++++++++ src/insns/xpermb.adoc | 60 +++++++++++++++++++++++++++ src/insns/xpermn.adoc | 60 +++++++++++++++++++++++++++ src/insns/zext_h.adoc | 61 +++++++++++++++++++++++++++ src/insns/zip.adoc | 60 +++++++++++++++++++++++++++ 52 files changed, 2768 insertions(+) create mode 100644 src/insns/add_uw.adoc create mode 100644 src/insns/andn.adoc create mode 100644 src/insns/bclr.adoc create mode 100644 src/insns/bclri.adoc create mode 100644 src/insns/bext.adoc create mode 100644 src/insns/bexti.adoc create mode 100644 src/insns/binv.adoc create mode 100644 src/insns/binvi.adoc create mode 100644 src/insns/bset.adoc create mode 100644 src/insns/bseti.adoc create mode 100644 src/insns/clmul.adoc create mode 100644 src/insns/clmulh.adoc create mode 100644 src/insns/clmulr.adoc create mode 100644 src/insns/clz.adoc create mode 100644 src/insns/clzw.adoc create mode 100644 src/insns/cpop.adoc create mode 100644 src/insns/cpopw.adoc create mode 100644 src/insns/ctz.adoc create mode 100644 src/insns/ctzw.adoc create mode 100644 src/insns/max.adoc create mode 100644 src/insns/maxu.adoc create mode 100644 src/insns/min.adoc create mode 100644 src/insns/minu.adoc create mode 100644 src/insns/orc_b.adoc create mode 100644 src/insns/orn.adoc create mode 100644 src/insns/pack.adoc create mode 100644 src/insns/packh.adoc create mode 100644 src/insns/packw.adoc create mode 100644 src/insns/rev8.adoc create mode 100644 src/insns/revb.adoc create mode 100644 src/insns/rol.adoc create mode 100644 src/insns/rolw.adoc create mode 100644 src/insns/ror.adoc create mode 100644 src/insns/rori.adoc create mode 100644 src/insns/roriw.adoc create mode 100644 src/insns/rorw.adoc create mode 100644 src/insns/sext_b.adoc create mode 100644 src/insns/sext_h.adoc create mode 100644 src/insns/sh1add.adoc create mode 100644 src/insns/sh1add_uw.adoc create mode 100644 src/insns/sh2add.adoc create mode 100644 src/insns/sh2add_uw.adoc create mode 100644 src/insns/sh3add.adoc create mode 100644 src/insns/sh3add_uw.adoc create mode 100644 src/insns/slli_uw.adoc create mode 100644 src/insns/unzip.adoc create mode 100644 src/insns/xnor.adoc create mode 100644 src/insns/xpermb.adoc create mode 100644 src/insns/xpermn.adoc create mode 100644 src/insns/zext_h.adoc create mode 100644 src/insns/zip.adoc diff --git a/src/b-st-ext.adoc b/src/b-st-ext.adoc index 194afe8..de8317e 100644 --- a/src/b-st-ext.adoc +++ b/src/b-st-ext.adoc @@ -1034,3 +1034,107 @@ common operations in cryptographic workloads. | <> |=== + +[#insns,reftext="Instructions (in alphabetical order)"] +=== Instructions (in alphabetical order) +include::insns/add_uw.adoc[] +<<< +include::insns/andn.adoc[] +<<< +include::insns/bclr.adoc[] +<<< +include::insns/bclri.adoc[] +<<< +include::insns/bext.adoc[] +<<< +include::insns/bexti.adoc[] +<<< +include::insns/binv.adoc[] +<<< +include::insns/binvi.adoc[] +<<< +include::insns/bset.adoc[] +<<< +include::insns/bseti.adoc[] +<<< +include::insns/clmul.adoc[] +<<< +include::insns/clmulh.adoc[] +<<< +include::insns/clmulr.adoc[] +<<< +include::insns/clz.adoc[] +<<< +include::insns/clzw.adoc[] +<<< +include::insns/cpop.adoc[] +<<< +include::insns/cpopw.adoc[] +<<< +include::insns/ctz.adoc[] +<<< +include::insns/ctzw.adoc[] +<<< +include::insns/max.adoc[] +<<< +include::insns/maxu.adoc[] +<<< +include::insns/min.adoc[] +<<< +include::insns/minu.adoc[] +<<< +include::insns/orc_b.adoc[] +<<< +include::insns/orn.adoc[] +<<< +include::insns/pack.adoc[] +<<< +include::insns/packh.adoc[] +<<< +include::insns/packw.adoc[] +<<< +include::insns/rev8.adoc[] +<<< +include::insns/revb.adoc[] +<<< +include::insns/rol.adoc[] +<<< +include::insns/rolw.adoc[] +<<< +include::insns/ror.adoc[] +<<< +include::insns/rori.adoc[] +<<< +include::insns/roriw.adoc[] +<<< +include::insns/rorw.adoc[] +<<< +include::insns/sext_b.adoc[] +<<< +include::insns/sext_h.adoc[] +<<< +include::insns/sh1add.adoc[] +<<< +include::insns/sh1add_uw.adoc[] +<<< +include::insns/sh2add.adoc[] +<<< +include::insns/sh2add_uw.adoc[] +<<< +include::insns/sh3add.adoc[] +<<< +include::insns/sh3add_uw.adoc[] +<<< +include::insns/slli_uw.adoc[] +<<< +include::insns/unzip.adoc[] +<<< +include::insns/xnor.adoc[] +<<< +include::insns/xpermb.adoc[] +<<< +include::insns/xpermn.adoc[] +<<< +include::insns/zext_h.adoc[] +<<< +include::insns/zip.adoc[] \ No newline at end of file diff --git a/src/insns/add_uw.adoc b/src/insns/add_uw.adoc new file mode 100644 index 0000000..a8b9588 --- /dev/null +++ b/src/insns/add_uw.adoc @@ -0,0 +1,50 @@ +[#insns-add_uw,reftext=Add unsigned word] +==== add.uw + +Synopsis:: +Add unsigned word + +Mnemonic:: +add.uw _rd_, _rs1_, _rs2_ + + +Pseudoinstructions:: +zext.w _rd_, _rs1_ → add.uw _rd_, _rs1_, zero + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x3b, attr: ['OP-32'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x0, attr: ['ADD.UW'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x04, attr: ['ADD.UW'] }, +]} +.... + +Description:: +This instruction performs an XLEN-wide addition between _rs2_ and the zero-extended least-significant word of _rs1_. + +Operation:: +[source,sail] +-- +let base = X(rs2); +let index = EXTZ(X(rs1)[31..0]); + +X(rd) = base + index; +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zba (<<#zba>>) +|0.93 +|Frozen +|=== + diff --git a/src/insns/andn.adoc b/src/insns/andn.adoc new file mode 100644 index 0000000..e0551b2 --- /dev/null +++ b/src/insns/andn.adoc @@ -0,0 +1,47 @@ +[#insns-andn,reftext="AND with inverted operand"] +==== andn + +Synopsis:: +AND with inverted operand + +Mnemonic:: +andn _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x7, attr: ['ANDN']}, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x20, attr: ['ANDN'] }, +]} +.... + +Description:: +This instruction performs the bitwise logical AND operation between _rs1_ and the bitwise inversion of _rs2_. + +Operation:: +[source,sail] +-- +X(rd) = X(rs1) & ~X(rs2); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen + +|Zbkb (<<#zbkb>>) +|v0.9.4 +|Frozen +|=== + diff --git a/src/insns/bclr.adoc b/src/insns/bclr.adoc new file mode 100644 index 0000000..a5095fe --- /dev/null +++ b/src/insns/bclr.adoc @@ -0,0 +1,45 @@ +[#insns-bclr,reftext="Single-Bit Clear (Register)"] +==== bclr + +Synopsis:: +Single-Bit Clear (Register) + +Mnemonic:: +bclr _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['BCLR'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x24, attr: ['BCLR/BEXT'] }, +]} +.... + +Description:: +This instruction returns _rs1_ with a single bit cleared at the index specified in _rs2_. +The index is read from the lower log2(XLEN) bits of _rs2_. + +Operation:: +[source,sail] +-- +let index = X(rs2) & (XLEN - 1); +X(rd) = X(rs1) & ~(1 << index) +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbs (<<#zbs>>) +|0.93 +|Frozen +|=== + diff --git a/src/insns/bclri.adoc b/src/insns/bclri.adoc new file mode 100644 index 0000000..bafc115 --- /dev/null +++ b/src/insns/bclri.adoc @@ -0,0 +1,59 @@ +[#insns-bclri,reftext="Single-Bit Clear (Immediate)"] +==== bclri + +Synopsis:: +Single-Bit Clear (Immediate) + +Mnemonic:: +bclri _rd_, _rs1_, _shamt_ + +Encoding (RV32):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['BCLRI'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'shamt' }, + { bits: 7, name: 0x24, attr: ['BCLRI'] }, +]} +.... + +Encoding (RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['BCLRI'] }, + { bits: 5, name: 'rs1' }, + { bits: 6, name: 'shamt' }, + { bits: 6, name: 0x12, attr: ['BCLRI'] }, +]} +.... + +Description:: +This instruction returns _rs1_ with a single bit cleared at the index specified in _shamt_. +The index is read from the lower log2(XLEN) bits of _shamt_. +For RV32, the encodings corresponding to shamt[5]=1 are reserved. + +Operation:: +[source,sail] +-- +let index = shamt & (XLEN - 1); +X(rd) = X(rs1) & ~(1 << index) +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbs (<<#zbs>>) +|0.93 +|Frozen +|=== + diff --git a/src/insns/bext.adoc b/src/insns/bext.adoc new file mode 100644 index 0000000..22cd3fc --- /dev/null +++ b/src/insns/bext.adoc @@ -0,0 +1,46 @@ +[#insns-bext,reftext="Single-Bit Extract (Register)"] +==== bext + +Synopsis:: +Single-Bit Extract (Register) +// Should we describe this as a Set-if-bit-is-set? + +Mnemonic:: +bext _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x5, attr: ['BEXT'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x24, attr: ['BCLR/BEXT'] }, +]} +.... + +Description:: +This instruction returns a single bit extracted from _rs1_ at the index specified in _rs2_. +The index is read from the lower log2(XLEN) bits of _rs2_. + +Operation:: +[source,sail] +-- +let index = X(rs2) & (XLEN - 1); +X(rd) = (X(rs1) >> index) & 1; +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbs (<<#zbs>>) +|0.93 +|Frozen +|=== + diff --git a/src/insns/bexti.adoc b/src/insns/bexti.adoc new file mode 100644 index 0000000..1f58ca7 --- /dev/null +++ b/src/insns/bexti.adoc @@ -0,0 +1,59 @@ +[#insns-bexti,reftext="Single-Bit Extract (Immediate)"] +==== bexti + +Synopsis:: +Single-Bit Extract (Immediate) + +Mnemonic:: +bexti _rd_, _rs1_, _shamt_ + +Encoding (RV32):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x5, attr: ['BEXTI'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'shamt' }, + { bits: 7, name: 0x24, attr: ['BEXTI/BCLRI'] }, +]} +.... + +Encoding (RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x5, attr: ['BEXTI'] }, + { bits: 5, name: 'rs1' }, + { bits: 6, name: 'shamt' }, + { bits: 6, name: 0x12, attr: ['BEXTI/BCLRI'] }, +]} +.... + +Description:: +This instruction returns a single bit extracted from _rs1_ at the index specified in _rs2_. +The index is read from the lower log2(XLEN) bits of _shamt_. +For RV32, the encodings corresponding to shamt[5]=1 are reserved. + +Operation:: +[source,sail] +-- +let index = shamt & (XLEN - 1); +X(rd) = (X(rs1) >> index) & 1; +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbs (<<#zbs>>) +|0.93 +|Frozen +|=== + diff --git a/src/insns/binv.adoc b/src/insns/binv.adoc new file mode 100644 index 0000000..04cc930 --- /dev/null +++ b/src/insns/binv.adoc @@ -0,0 +1,45 @@ +[#insns-binv,reftext="Single-Bit Invert (Register)"] +==== binv + +Synopsis:: +Single-Bit Invert (Register) + +Mnemonic:: +binv _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['BINV'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x34, attr: ['BINV'] }, +]} +.... + +Description:: +This instruction returns _rs1_ with a single bit inverted at the index specified in _rs2_. +The index is read from the lower log2(XLEN) bits of _rs2_. + +Operation:: +[source,sail] +-- +let index = X(rs2) & (XLEN - 1); +X(rd) = X(rs1) ^ (1 << index) +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbs (<<#zbs>>) +|0.93 +|Frozen +|=== + diff --git a/src/insns/binvi.adoc b/src/insns/binvi.adoc new file mode 100644 index 0000000..e7ec25e --- /dev/null +++ b/src/insns/binvi.adoc @@ -0,0 +1,59 @@ +[#insns-binvi,reftext="Single-Bit Invert (Immediate)"] +==== binvi + +Synopsis:: +Single-Bit Invert (Immediate) + +Mnemonic:: +binvi _rd_, _rs1_, _shamt_ + +Encoding (RV32):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['BINV'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'shamt' }, + { bits: 7, name: 0x34, attr: ['BINVI'] }, +]} +.... + +Encoding (RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['BINV'] }, + { bits: 5, name: 'rs1' }, + { bits: 6, name: 'shamt' }, + { bits: 6, name: 0x1a, attr: ['BINVI'] }, +]} +.... + +Description:: +This instruction returns _rs1_ with a single bit inverted at the index specified in _shamt_. +The index is read from the lower log2(XLEN) bits of _shamt_. +For RV32, the encodings corresponding to shamt[5]=1 are reserved. + +Operation:: +[source,sail] +-- +let index = shamt & (XLEN - 1); +X(rd) = X(rs1) ^ (1 << index) +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbs (<<#zbs>>) +|0.93 +|Frozen +|=== + diff --git a/src/insns/bset.adoc b/src/insns/bset.adoc new file mode 100644 index 0000000..e39fbde --- /dev/null +++ b/src/insns/bset.adoc @@ -0,0 +1,44 @@ +[#insns-bset,reftext="Single-Bit Set (Register)"] +==== bset + +Synopsis:: +Single-Bit Set (Register) + +Mnemonic:: +bset _rd_, _rs1_,_rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['BSET'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x14, attr: ['BSET'] }, +]} +.... + +Description:: +This instruction returns _rs1_ with a single bit set at the index specified in _rs2_. +The index is read from the lower log2(XLEN) bits of _rs2_. + +Operation:: +[source,sail] +-- +let index = X(rs2) & (XLEN - 1); +X(rd) = X(rs1) | (1 << index) +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbs (<<#zbs>>) +|0.93 +|Frozen +|=== diff --git a/src/insns/bseti.adoc b/src/insns/bseti.adoc new file mode 100644 index 0000000..9d80d98 --- /dev/null +++ b/src/insns/bseti.adoc @@ -0,0 +1,59 @@ +[#insns-bseti,reftext="Single-Bit Set (Immediate)"] +==== bseti + +Synopsis:: +Single-Bit Set (Immediate) + +Mnemonic:: +bseti _rd_, _rs1_,_shamt_ + +Encoding (RV32):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['BSETI'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'shamt' }, + { bits: 7, name: 0x14, attr: ['BSETI'] }, +]} +.... + +Encoding (RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['BSETI'] }, + { bits: 5, name: 'rs1' }, + { bits: 6, name: 'shamt' }, + { bits: 6, name: 0x0a, attr: ['BSETI'] }, +]} +.... + +Description:: +This instruction returns _rs1_ with a single bit set at the index specified in _shamt_. +The index is read from the lower log2(XLEN) bits of _shamt_. +For RV32, the encodings corresponding to shamt[5]=1 are reserved. + +Operation:: +[source,sail] +-- +let index = shamt & (XLEN - 1); +X(rd) = X(rs1) | (1 << index) +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbs (<<#zbs>>) +|0.93 +|Frozen +|=== + diff --git a/src/insns/clmul.adoc b/src/insns/clmul.adoc new file mode 100644 index 0000000..b9976c9 --- /dev/null +++ b/src/insns/clmul.adoc @@ -0,0 +1,57 @@ +[#insns-clmul,reftext="Carry-less multiply (low-part)"] +==== clmul + +Synopsis:: +Carry-less multiply (low-part) + +Mnemonic:: +clmul _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['CLMUL'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x5, attr: ['MINMAX/CLMUL'] }, +]} +.... + +Description:: +clmul produces the lower half of the 2·XLEN carry-less product. + +Operation:: +[source,sail] +-- +let rs1_val = X(rs1); +let rs2_val = X(rs2); +let output : xlenbits = 0; + +foreach (i from 0 to (xlen - 1) by 1) { + output = if ((rs2_val >> i) & 1) + then output ^ (rs1_val << i); + else output; +} + +X[rd] = output +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbc (<<#zbc>>) +|0.93 +|Frozen + +|Zbkc (<<#zbkc>>) +|v0.9.4 +|Frozen +|=== + diff --git a/src/insns/clmulh.adoc b/src/insns/clmulh.adoc new file mode 100644 index 0000000..e4c6d88 --- /dev/null +++ b/src/insns/clmulh.adoc @@ -0,0 +1,57 @@ +[#insns-clmulh,reftext="Carry-less multiply (high-part)"] +==== clmulh + +Synopsis:: +Carry-less multiply (high-part) + +Mnemonic:: +clmulh _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x3, attr: ['CLMULH'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x5, attr: ['MINMAX/CLMUL'] }, +]} +.... + +Description:: +clmulh produces the upper half of the 2·XLEN carry-less product. + +Operation:: +[source,sail] +-- +let rs1_val = X(rs1); +let rs2_val = X(rs2); +let output : xlenbits = 0; + +foreach (i from 1 to xlen by 1) { + output = if ((rs2_val >> i) & 1) + then output ^ (rs1_val >> (xlen - i)); + else output; +} + +X[rd] = output +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbc (<<#zbc>>) +|0.93 +|Frozen + +|Zbkc (<<#zbkc>>) +|v0.9.4 +|Frozen +|=== + diff --git a/src/insns/clmulr.adoc b/src/insns/clmulr.adoc new file mode 100644 index 0000000..1db4ca7 --- /dev/null +++ b/src/insns/clmulr.adoc @@ -0,0 +1,63 @@ +[#insns-clmulr,reftext="Carry-less multiply (reversed)"] +==== clmulr + +Synopsis:: +Carry-less multiply (reversed) + +Mnemonic:: +clmulr _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x2, attr: ['CLMULR'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x5, attr: ['MINMAX/CLMUL'] }, +]} +.... + +Description:: +*clmulr* produces bits 2·XLEN−2:XLEN-1 of the 2·XLEN carry-less +product. + +Operation:: +[source,sail] +-- +let rs1_val = X(rs1); +let rs2_val = X(rs2); +let output : xlenbits = 0; + +foreach (i from 0 to (xlen - 1) by 1) { + output = if ((rs2_val >> i) & 1) + then output ^ (rs1_val >> (xlen - i - 1)); + else output; +} + +X[rd] = output +-- + +.Note +[NOTE, caption="A" ] +=============================================================== +The *clmulr* instruction is used to accelerate CRC calculations. +The *r* in the instruction's mnemonic stands for _reversed_, as the +instruction is equivalent to bit-reversing the inputs, performing +a *clmul*, then bit-reversing the output. +=============================================================== + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbc (<<#zbc>>) +|0.93 +|Frozen +|=== + diff --git a/src/insns/clz.adoc b/src/insns/clz.adoc new file mode 100644 index 0000000..898d5f5 --- /dev/null +++ b/src/insns/clz.adoc @@ -0,0 +1,52 @@ +[#insns-clz,reftext="Count leading zero bits"] +==== clz + +Synopsis:: +Count leading zero bits + +Mnemonic:: +clz _rd_, _rs_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['CLZ'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 0x0, attr: ['CLZ'] }, + { bits: 7, name: 0x30, attr: ['CLZ'] }, +]} +.... + +Description:: +This instruction counts the number of 0's before the first 1, starting at the most-significant bit (i.e., XLEN-1) and progressing to bit 0. Accordingly, if the input is 0, the output is XLEN, and if the most-significant bit of the input is a 1, the output is 0. + +Operation:: +[source,sail] +-- +val HighestSetBit : forall ('N : Int), 'N >= 0. bits('N) -> int + +function HighestSetBit x = { + foreach (i from (xlen - 1) to 0 by 1 in dec) + if [x[i]] == 0b1 then return(i) else (); + return -1; +} + +let rs = X(rs); +X[rd] = (xlen - 1) - HighestSetBit(rs); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen +|=== + diff --git a/src/insns/clzw.adoc b/src/insns/clzw.adoc new file mode 100644 index 0000000..67a64d9 --- /dev/null +++ b/src/insns/clzw.adoc @@ -0,0 +1,53 @@ +[#insns-clzw,reftext="Count leading zero bits in word"] +==== clzw + +Synopsis:: +Count leading zero bits in word + +Mnemonic:: +clzw _rd_, _rs_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x1b, attr: ['OP-IMM-32'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['CLZW'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 0x0, attr: ['CLZW'] }, + { bits: 7, name: 0x30, attr: ['CLZW'] }, +]} +.... + +Description:: +This instruction counts the number of 0's before the first 1 starting at bit 31 and progressing to bit 0. +Accordingly, if the least-significant word is 0, the output is 32, and if the most-significant bit of the word (i.e., bit 31) is a 1, the output is 0. + +Operation:: +[source,sail] +-- +val HighestSetBit32 : forall ('N : Int), 'N >= 0. bits('N) -> int + +function HighestSetBit32 x = { + foreach (i from 31 to 0 by 1 in dec) + if [x[i]] == 0b1 then return(i) else (); + return -1; +} + +let rs = X(rs); +X[rd] = 31 - HighestSetBit(rs); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen +|=== + diff --git a/src/insns/cpop.adoc b/src/insns/cpop.adoc new file mode 100644 index 0000000..24e7a2f --- /dev/null +++ b/src/insns/cpop.adoc @@ -0,0 +1,56 @@ +[#insns-cpop,reftext="Count set bits"] +==== cpop + +Synopsis:: +Count set bits + +Mnemonic:: +cpop _rd_, _rs_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['CPOP'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 0x2, attr: ['CPOP'] }, + { bits: 7, name: 0x30, attr: ['CPOP'] }, +]} +.... +Description:: +This instructions counts the number of 1's (i.e., set bits) in the source register. + +Operation:: +[source,sail] +-- +let bitcount = 0; +let rs = X(rs); + +foreach (i from 0 to (xlen - 1) in inc) + if rs[i] == 0b1 then bitcount = bitcount + 1 else (); + +X[rd] = bitcount +-- + +.Software Hint +[NOTE, caption="SH" ] +=============================================================== +This operations is known as population count, popcount, sideways sum, bit summation, or Hamming weight. + +The GCC builtin function `+__builtin_popcount (unsigned int x)+` is implemented by cpop on RV32 and by *cpopw* on RV64. +The GCC builtin function `+__builtin_popcountl (unsigned long x)+` for LP64 is implemented by *cpop* on RV64. +=============================================================== + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen +|=== diff --git a/src/insns/cpopw.adoc b/src/insns/cpopw.adoc new file mode 100644 index 0000000..d61336a --- /dev/null +++ b/src/insns/cpopw.adoc @@ -0,0 +1,49 @@ +[#insns-cpopw,reftext="Count set bits in word"] +==== cpopw + +Synopsis:: +Count set bits in word + +Mnemonic:: +cpopw _rd_, _rs_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x1b, attr: ['OP-IMM-32'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['CPOPW'] }, + { bits: 5, name: 'rs' }, + { bits: 5, name: 0x2, attr: ['CPOPW'] }, + { bits: 7, name: 0x30, attr: ['CPOPW'] }, +]} +.... +Description:: +This instructions counts the number of 1's (i.e., set bits) in the least-significant word of the source register. + +Operation:: +[source,sail] +-- +let bitcount = 0; +let val = X(rs); + +foreach (i from 0 to 31 in inc) + if val[i] == 0b1 then bitcount = bitcount + 1 else (); + +X[rd] = bitcount +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen +|=== + + diff --git a/src/insns/ctz.adoc b/src/insns/ctz.adoc new file mode 100644 index 0000000..545d768 --- /dev/null +++ b/src/insns/ctz.adoc @@ -0,0 +1,53 @@ +[#insns-ctz,reftext="Count trailing zero bits"] +==== ctz + +Synopsis:: +Count trailing zeros + +Mnemonic:: +ctz _rd_, _rs_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['CTZ/CTZW'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 0x1, attr: ['CTZ/CTZW'] }, + { bits: 7, name: 0x30, attr: ['CTZ/CTZW'] }, +]} +.... + +Description:: +This instruction counts the number of 0's before the first 1, starting at the least-significant bit (i.e., 0) and progressing to the most-significant bit (i.e., XLEN-1). +Accordingly, if the input is 0, the output is XLEN, and if the least-significant bit of the input is a 1, the output is 0. + +Operation:: +[source,sail] +-- +val LowestSetBit : forall ('N : Int), 'N >= 0. bits('N) -> int + +function LowestSetBit x = { + foreach (i from 0 to (xlen - 1) by 1 in dec) + if [x[i]] == 0b1 then return(i) else (); + return xlen; +} + +let rs = X(rs); +X[rd] = LowestSetBit(rs); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen +|=== + diff --git a/src/insns/ctzw.adoc b/src/insns/ctzw.adoc new file mode 100644 index 0000000..7442cf0 --- /dev/null +++ b/src/insns/ctzw.adoc @@ -0,0 +1,52 @@ +[#insns-ctzw,reftext="Count trailing zero bits in word"] +==== ctzw + +Synopsis:: +Count trailing zero bits in word + +Mnemonic:: +ctzw _rd_, _rs_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x1b, attr: ['OP-IMM-32'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['CTZ/CTZW'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 0x1, attr: ['CTZ/CTZW'] }, + { bits: 7, name: 0x30, attr: ['CTZ/CTZW'] }, +]} +.... + +Description:: +This instruction counts the number of 0's before the first 1, starting at the least-significant bit (i.e., 0) and progressing to the most-significant bit of the least-significant word (i.e., 31). Accordingly, if the least-significant word is 0, the output is 32, and if the least-significant bit of the input is a 1, the output is 0. + +Operation:: +[source,sail] +-- +val LowestSetBit32 : forall ('N : Int), 'N >= 0. bits('N) -> int + +function LowestSetBit32 x = { + foreach (i from 0 to 31 by 1 in dec) + if [x[i]] == 0b1 then return(i) else (); + return 32; +} + +let rs = X(rs); +X[rd] = LowestSetBit32(rs); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen +|=== + diff --git a/src/insns/max.adoc b/src/insns/max.adoc new file mode 100644 index 0000000..621198e --- /dev/null +++ b/src/insns/max.adoc @@ -0,0 +1,60 @@ +[#insns-max,reftext="Maximum"] +==== max + +Synopsis:: +Maximum + +Mnemonic:: +max _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x6, attr: ['MAX']}, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x05, attr: ['MINMAX/CLMUL'] }, +]} +.... + +Description:: +This instruction returns the larger of two signed integers. + +Operation:: +[source,sail] +-- +let rs1_val = X(rs1); +let rs2_val = X(rs2); + +let result = if rs1_val <_s rs2_val + then rs2_val + else rs1_val; + +X(rd) = result; +-- + +.Software Hint +[NOTE, caption="SW"] +=============================================================== +Calculating the absolute value of a signed integer can be performed +using the following sequence: *neg rD,rS* followed by *max +rD,rS,rD*. When using this common sequence, it is suggested that they +are scheduled with no intervening instructions so that +implementations that are so optimized can fuse them together. +=============================================================== + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen +|=== + diff --git a/src/insns/maxu.adoc b/src/insns/maxu.adoc new file mode 100644 index 0000000..d2473a7 --- /dev/null +++ b/src/insns/maxu.adoc @@ -0,0 +1,50 @@ +[#insns-maxu,reftext="Unsigned maximum"] +==== maxu + +Synopsis:: +Unsigned maximum + +Mnemonic:: +maxu _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x7, attr: ['MAXU']}, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x05, attr: ['MINMAX/CLMUL'] }, +]} +.... + +Description:: +This instruction returns the larger of two unsigned integers. + +Operation:: +[source,sail] +-- +let rs1_val = X(rs1); +let rs2_val = X(rs2); + +let result = if rs1_val <_u rs2_val + then rs2_val + else rs1_val; + +X(rd) = result; +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen +|=== + diff --git a/src/insns/min.adoc b/src/insns/min.adoc new file mode 100644 index 0000000..550ca69 --- /dev/null +++ b/src/insns/min.adoc @@ -0,0 +1,50 @@ +[#insns-min,reftext="Minimum"] +==== min + +Synopsis:: +Minimum + +Mnemonic:: +min _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x4, attr: ['MIN']}, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x05, attr: ['MINMAX/CLMUL'] }, +]} +.... + +Description:: +This instruction returns the smaller of two signed integers. + +Operation:: +[source,sail] +-- +let rs1_val = X(rs1); +let rs2_val = X(rs2); + +let result = if rs1_val <_s rs2_val + then rs1_val + else rs2_val; + +X(rd) = result; +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen +|=== + diff --git a/src/insns/minu.adoc b/src/insns/minu.adoc new file mode 100644 index 0000000..8ff623d --- /dev/null +++ b/src/insns/minu.adoc @@ -0,0 +1,50 @@ +[#insns-minu,reftext="Unsigned minimum"] +==== minu + +Synopsis:: +Unsigned minimum + +Mnemonic:: +minu _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x5, attr: ['MINU']}, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x05, attr: ['MINMAX/CLMUL'] }, +]} +.... + +Description:: +This instruction returns the smaller of two unsigned integers. + +Operation:: +[source,sail] +-- +let rs1_val = X(rs1); +let rs2_val = X(rs2); + +let result = if rs1_val <_u rs2_val + then rs1_val + else rs2_val; + +X(rd) = result; +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen +|=== + diff --git a/src/insns/orc_b.adoc b/src/insns/orc_b.adoc new file mode 100644 index 0000000..2a16d18 --- /dev/null +++ b/src/insns/orc_b.adoc @@ -0,0 +1,51 @@ +[#insns-orc_b,reftext="Bitwise OR-Combine, byte granule"] +==== orc.b + +Synopsis:: +Bitwise OR-Combine, byte granule + +Mnemonic:: +orc.b _rd_, _rs_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x5 }, + { bits: 5, name: 'rs' }, + { bits: 12, name: 0x287 } +]} +.... + +Description:: +Combines the bits within each byte using bitwise logical OR. +This sets the bits of each byte in the result _rd_ to all zeros if no bit within the respective byte of _rs_ is set, or to all ones if any bit within the respective byte of _rs_ is set. + +Operation:: +[source,sail] +-- +let input = X(rs); +let output : xlenbits = 0; + +foreach (i from 0 to (xlen - 8) by 8) { + output[(i + 7)..i] = if input[(i + 7)..i] == 0 + then 0b00000000 + else 0b11111111; +} + +X[rd] = output; +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen +|=== diff --git a/src/insns/orn.adoc b/src/insns/orn.adoc new file mode 100644 index 0000000..7a6eefb --- /dev/null +++ b/src/insns/orn.adoc @@ -0,0 +1,47 @@ +[#insns-orn,reftext="OR with inverted operand"] +==== orn + +Synopsis:: +OR with inverted operand + +Mnemonic:: +orn _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x6, attr: ['ORN']}, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x20, attr: ['ORN'] }, +]} +.... + +Description:: +This instruction performs the bitwise logical OR operation between _rs1_ and the bitwise inversion of _rs2_. + +Operation:: +[source,sail] +-- +X(rd) = X(rs1) | ~X(rs2); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen + +|Zbkb (<<#zbkb>>) +|v0.9.4 +|Frozen +|=== + diff --git a/src/insns/pack.adoc b/src/insns/pack.adoc new file mode 100644 index 0000000..82c3b2a --- /dev/null +++ b/src/insns/pack.adoc @@ -0,0 +1,46 @@ +[#insns-pack,reftext="Pack low halves of registers"] +==== pack + +Synopsis:: +Pack the low halves of _rs1_ and _rs2_ into _rd_. + +Mnemonic:: +pack _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + {bits: 7, name: 0x33, attr: ['OP'] }, + {bits: 5, name: 'rd'}, + {bits: 3, name: 0x4, attr:['PACK']}, + {bits: 5, name: 'rs1'}, + {bits: 5, name: 'rs2'}, + {bits: 7, name: 0x4, attr:['PACK']}, +]} +.... + +Description:: +The pack instruction packs the XLEN/2-bit lower halves of _rs1_ and _rs2_ into +_rd_, with _rs1_ in the lower half and _rs2_ in the upper half. + +Operation:: +[source,sail] +-- +let lo_half : bits(xlen/2) = X(rs1)[xlen/2-1..0]; +let hi_half : bits(xlen/2) = X(rs2)[xlen/2-1..0]; +X(rd) = EXTZ(hi_half @ lo_half); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbkb (<<#zbkb>>) +|v0.9.4 +|Frozen +|=== + diff --git a/src/insns/packh.adoc b/src/insns/packh.adoc new file mode 100644 index 0000000..1af719e --- /dev/null +++ b/src/insns/packh.adoc @@ -0,0 +1,47 @@ +[#insns-packh,reftext="Pack low bytes of registers"] +==== packh + +Synopsis:: +Pack the low bytes of _rs1_ and _rs2_ into _rd_. + +Mnemonic:: +packh _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + {bits: 7, name: 0x33, attr: ['OP'] }, + {bits: 5, name: 'rd'}, + {bits: 3, name: 0x7, attr: ['PACKH']}, + {bits: 5, name: 'rs1'}, + {bits: 5, name: 'rs2'}, + {bits: 7, name: 0x4, attr: ['PACKH']}, +]} +.... + +Description:: +And the packh instruction packs the least-significant bytes of +_rs1_ and _rs2_ into the 16 least-significant bits of _rd_, +zero extending the rest of _rd_. + +Operation:: +[source,sail] +-- +let lo_half : bits(8) = X(rs1)[7..0]; +let hi_half : bits(8) = X(rs2)[7..0]; +X(rd) = EXTZ(hi_half @ lo_half); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbkb (<<#zbkb>>) +|v0.9.4 +|Frozen +|=== + diff --git a/src/insns/packw.adoc b/src/insns/packw.adoc new file mode 100644 index 0000000..78c5e1b --- /dev/null +++ b/src/insns/packw.adoc @@ -0,0 +1,49 @@ +[#insns-packw,reftext="Pack low 16-bits of registers (RV64)"] +==== packw + +Synopsis:: +Pack the low 16-bits of _rs1_ and _rs2_ into _rd_ on RV64. + +Mnemonic:: +packw _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ +{bits: 2, name: 0x3}, +{bits: 5, name: 0xe}, +{bits: 5, name: 'rd'}, +{bits: 3, name: 0x4}, +{bits: 5, name: 'rs1'}, +{bits: 5, name: 'rs2'}, +{bits: 7, name: 0x4}, +]} +.... + +Description:: +This instruction packs the low 16 bits of +_rs1_ and _rs2_ into the 32 least-significant bits of _rd_, +sign extending the 32-bit result to the rest of _rd_. +This instruction only exists on RV64 based systems. + +Operation:: +[source,sail] +-- +let lo_half : bits(16) = X(rs1)[15..0]; +let hi_half : bits(16) = X(rs2)[15..0]; +X(rd) = EXTS(hi_half @ lo_half); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbkb (<<#zbkb>>) +|v0.9.4 +|Frozen +|=== + diff --git a/src/insns/rev8.adoc b/src/insns/rev8.adoc new file mode 100644 index 0000000..5c92550 --- /dev/null +++ b/src/insns/rev8.adoc @@ -0,0 +1,82 @@ +[#insns-rev8,reftext="Byte-reverse register"] +==== rev8 + +Synopsis:: +Byte-reverse register + +Mnemonic:: +rev8 _rd_, _rs_ + +Encoding (RV32):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x5 }, + { bits: 5, name: 'rs' }, + { bits: 12, name: 0x698 } +]} +.... + +Encoding (RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x5 }, + { bits: 5, name: 'rs' }, + { bits: 12, name: 0x6b8 } +]} +.... + +Description:: +This instruction reverses the order of the bytes in _rs_. + +Operation:: +[source,sail] +-- +let input = X(rs); +let output : xlenbits = 0; +let j = xlen - 1; + +foreach (i from 0 to (xlen - 8) by 8) { + output[i..(i + 7)] = input[(j - 7)..j]; + j = j - 8; +} + +X[rd] = output +-- + +.Note +[NOTE, caption="A" ] +=============================================================== +The *rev8* mnemonic corresponds to different instruction encodings in RV32 and RV64. +=============================================================== + +.Software Hint +[NOTE, caption="SH" ] +=============================================================== +The byte-reverse operation is only available for the full register +width. To emulate word-sized and halfword-sized byte-reversal, +perform a `rev8 rd,rs` followed by a `srai rd,rd,K`, where K is +XLEN-32 and XLEN-16, respectively. +=============================================================== + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen + +|Zbkb (<<#zbkb>>) +|v0.9.4 +|Frozen +|=== + diff --git a/src/insns/revb.adoc b/src/insns/revb.adoc new file mode 100644 index 0000000..67b83d1 --- /dev/null +++ b/src/insns/revb.adoc @@ -0,0 +1,46 @@ +[#insns-revb,reftext="Reverse bits in bytes"] +==== rev.b + +Synopsis:: +Reverse the bits in each byte of a source register. + +Mnemonic:: +rev.b _rd_, _rs_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x5 }, + { bits: 5, name: 'rs' }, + { bits: 12, name: 0x687 } +]} +.... + +Description:: +This instruction reverses the order of the bits in every byte of a register. + +Operation:: +[source,sail] +-- +result : xlenbits = EXTZ(0b0); +foreach (i from 0 to sizeof(xlen) by 8) { + result[i+7..i] = reverse_bits_in_byte(X(rs1)[i+7..i]); +}; +X(rd) = result; +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbkb (<<#zbkb>>) +|v0.9.4 +|Frozen +|=== + diff --git a/src/insns/rol.adoc b/src/insns/rol.adoc new file mode 100644 index 0000000..f937096 --- /dev/null +++ b/src/insns/rol.adoc @@ -0,0 +1,52 @@ +[#insns-rol,reftext="Rotate left (Register)"] +==== rol + +Synopsis:: +Rotate Left (Register) + +Mnemonic:: +rol _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['ROL']}, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x30, attr: ['ROL'] }, +]} +.... + +Description:: +This instruction performs a rotate left of _rs1_ by the amount in least-significant log2(XLEN) bits of _rs2_. + +Operation:: +[source,sail] +-- +let shamt = if xlen == 32 + then X(rs2)[4..0] + else X(rs2)[5..0]; +let result = (X(rs1) << shamt) | (X(rs1) >> (xlen - shamt)); + +X(rd) = result; +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen + +|Zbkb (<<#zbkb>>) +|v0.9.4 +|Frozen +|=== + diff --git a/src/insns/rolw.adoc b/src/insns/rolw.adoc new file mode 100644 index 0000000..feed9a7 --- /dev/null +++ b/src/insns/rolw.adoc @@ -0,0 +1,51 @@ +[#insns-rolw,reftext="Rotate Left Word (Register)"] +==== rolw + +Synopsis:: +Rotate Left Word (Register) + +Mnemonic:: +rolw _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x3b, attr: ['OP-32'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['ROLW']}, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x30, attr: ['ROLW'] }, +]} +.... + +Description:: +This instruction performs a rotate left on the least-significant word of _rs1_ by the amount in least-significant 5 bits of _rs2_. +The resulting word value is sign-extended by copying bit 31 to all of the more-significant bits. + +Operation:: +[source,sail] +-- +let rs1 = EXTZ(X(rs1)[31..0]) +let shamt = X(rs2)[4..0]; +let result = (rs1 << shamt) | (rs1 >> (32 - shamt)); +X(rd) = EXTS(result[31..0]); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen + +|Zbkb (<<#zbkb>>) +|v0.9.4 +|Frozen +|=== + diff --git a/src/insns/ror.adoc b/src/insns/ror.adoc new file mode 100644 index 0000000..c8a653f --- /dev/null +++ b/src/insns/ror.adoc @@ -0,0 +1,52 @@ +[#insns-ror,reftext="Rotate right (Register)"] +==== ror + +Synopsis:: +Rotate Right + +Mnemonic:: +ror _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x5, attr: ['ROR']}, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x30, attr: ['ROR'] }, +]} +.... + +Description:: +This instruction performs a rotate right of _rs1_ by the amount in least-significant log2(XLEN) bits of _rs2_. + +Operation:: +[source,sail] +-- +let shamt = if xlen == 32 + then X(rs2)[4..0] + else X(rs2)[5..0]; +let result = (X(rs1) >> shamt) | (X(rs1) << (xlen - shamt)); + +X(rd) = result; +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen + +|Zbkb (<<#zbkb>>) +|v0.9.4 +|Frozen +|=== + diff --git a/src/insns/rori.adoc b/src/insns/rori.adoc new file mode 100644 index 0000000..a63256e --- /dev/null +++ b/src/insns/rori.adoc @@ -0,0 +1,66 @@ +[#insns-rori,reftext="Rotate right (Immediate)"] +==== rori + +Synopsis:: +Rotate Right (Immediate) + +Mnemonic:: +rori _rd_, _rs1_, _shamt_ + +Encoding (RV32):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x5, attr: ['RORI']}, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'shamt' }, + { bits: 7, name: 0x30, attr: ['RORI'] }, +]} +.... + +Encoding (RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x5, attr: ['RORI']}, + { bits: 5, name: 'rs1' }, + { bits: 6, name: 'shamt' }, + { bits: 6, name: 0x18, attr: ['RORI'] }, +]} +.... + +Description:: +This instruction performs a rotate right of _rs1_ by the amount in the least-significant log2(XLEN) bits of _shamt_. +For RV32, the encodings corresponding to shamt[5]=1 are reserved. + +Operation:: +[source,sail] +-- +let shamt = if xlen == 32 + then shamt[4..0] + else shamt[5..0]; +let result = (X(rs1) >> shamt) | (X(rs1) << (xlen - shamt)); + +X(rd) = result; +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen + +|Zbkb (<<#zbkb>>) +|v0.9.4 +|Frozen +|=== + diff --git a/src/insns/roriw.adoc b/src/insns/roriw.adoc new file mode 100644 index 0000000..65b8fd9 --- /dev/null +++ b/src/insns/roriw.adoc @@ -0,0 +1,54 @@ +[#insns-roriw,reftext="Rotate right Word (Immediate)"] +==== roriw + +Synopsis:: +Rotate Right Word by Immediate + +Mnemonic:: +roriw _rd_, _rs1_, _shamt_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x1b, attr: ['OP-IMM-32'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x5, attr: ['RORIW']}, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'shamt' }, + { bits: 7, name: 0x30, attr: ['RORIW'] }, +]} +.... + +Description:: +This instruction performs a rotate right on the least-significant word +of _rs1_ by the amount in the least-significant log2(XLEN) bits of +_shamt_. +The resulting word value is sign-extended by copying bit 31 to all of +the more-significant bits. + + +Operation:: +[source,sail] +-- +let rs1_data = EXTZ(X(rs1)[31..0]; +let result = (rs1_data >> shamt) | (rs1_data << (32 - shamt)); +X(rd) = EXTS(result[31..0]); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen + +|Zbkb (<<#zbkb>>) +|v0.9.4 +|Frozen +|=== + diff --git a/src/insns/rorw.adoc b/src/insns/rorw.adoc new file mode 100644 index 0000000..d06d52f --- /dev/null +++ b/src/insns/rorw.adoc @@ -0,0 +1,51 @@ +[#insns-rorw,reftext="Rotate right Word (Register)"] +==== rorw + +Synopsis:: +Rotate Right Word (Register) + +Mnemonic:: +rorw _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x3b, attr: ['OP-32'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x5, attr: ['RORW']}, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x30, attr: ['RORW'] }, +]} +.... + +Description:: +This instruction performs a rotate right on the least-significant word of _rs1_ by the amount in least-significant 5 bits of _rs2_. +The resultant word is sign-extended by copying bit 31 to all of the more-significant bits. + +Operation:: +[source,sail] +-- +let rs1 = EXTZ(X(rs1)[31..0]) +let shamt = X(rs2)[4..0]; +let result = (rs1 >> shamt) | (rs1 << (32 - shamt)); +X(rd) = EXTS(result); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen + +|Zbkb (<<#zbkb>>) +|v0.9.4 +|Frozen +|=== + diff --git a/src/insns/sext_b.adoc b/src/insns/sext_b.adoc new file mode 100644 index 0000000..87a7571 --- /dev/null +++ b/src/insns/sext_b.adoc @@ -0,0 +1,43 @@ +[#insns-sext_b,reftext="Sign-extend byte"] +==== sext.b + +Synopsis:: +Sign-extend byte + +Mnemonic:: +sext.b _rd_, _rs_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['SEXT.B/SEXT.H'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 0x04, attr: ['SEXT.B'] }, + { bits: 7, name: 0x30 }, +]} +.... + +Description:: +This instruction sign-extends the least-significant byte in the source to XLEN by copying the most-significant bit in the byte (i.e., bit 7) to all of the more-significant bits. + +Operation:: +[source,sail] +-- +X(rd) = EXTS(X(rs)[7..0]); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen +|=== + diff --git a/src/insns/sext_h.adoc b/src/insns/sext_h.adoc new file mode 100644 index 0000000..f7208a5 --- /dev/null +++ b/src/insns/sext_h.adoc @@ -0,0 +1,43 @@ +[#insns-sext_h,reftext="Sign-extend halfword"] +==== sext.h + +Synopsis:: +Sign-extend halfword + +Mnemonic:: +sext.h _rd_, _rs_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['SEXT.B/SEXT.H'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 0x05, attr: ['SEXT.H'] }, + { bits: 7, name: 0x30 }, +]} +.... + +Description:: +This instruction sign-extends the least-significant halfword in _rs_ to XLEN by copying the most-significant bit in the halfword (i.e., bit 15) to all of the more-significant bits. + +Operation:: +[source,sail] +-- +X(rd) = EXTS(X(rs)[15..0]); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen +|=== + diff --git a/src/insns/sh1add.adoc b/src/insns/sh1add.adoc new file mode 100644 index 0000000..636fc54 --- /dev/null +++ b/src/insns/sh1add.adoc @@ -0,0 +1,46 @@ +[#insns-sh1add,reftext=Shift left by 1 and add] +==== sh1add + +Synopsis:: +Shift left by 1 and add + +Mnemonic:: +sh1add _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x2, attr: ['SH1ADD'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x10, attr: ['SH1ADD'] }, +]} +.... + +Description:: +This instruction shifts _rs1_ to the left by 1 bit and adds it to _rs2_. + +Operation:: +[source,sail] +-- +X(rd) = X(rs2) + (X(rs1) << 1); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zba (<<#zba>>) +|0.93 +|Frozen +|=== + + +// We have decided that this and all other instructions will not have reserved encodings for "useless encodings" +// We could follow suit of the base ISA and create HINTs if there is some recognized value for doing so diff --git a/src/insns/sh1add_uw.adoc b/src/insns/sh1add_uw.adoc new file mode 100644 index 0000000..09e515d --- /dev/null +++ b/src/insns/sh1add_uw.adoc @@ -0,0 +1,46 @@ +[#insns-sh1add_uw,reftext=Shift unsigned word left by 1 and add] +==== sh1add.uw + +Synopsis:: +Shift unsigned word left by 1 and add + +Mnemonic:: +sh1add.uw _rd_, _rs1_, _rs2_ +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x3b, attr: ['OP-32'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x2, attr: ['SH1ADD.UW'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x10, attr: ['SH1ADD.UW'] }, +]} +.... + +Description:: +This instruction performs an XLEN-wide addition of two addends. +The first addend is _rs2_. The second addend is the unsigned value formed by extracting the least-significant word of _rs1_ and shifting it left by 1 place. + +Operation:: +[source,sail] +-- +let base = X(rs2); +let index = EXTZ(X(rs1)[31..0]); + +X(rd) = base + (index << 1); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zba (<<#zba>>) +|0.93 +|Frozen +|=== + diff --git a/src/insns/sh2add.adoc b/src/insns/sh2add.adoc new file mode 100644 index 0000000..273a5df --- /dev/null +++ b/src/insns/sh2add.adoc @@ -0,0 +1,43 @@ +[#insns-sh2add,reftext=Shift left by 2 and add] +==== sh2add + +Synopsis:: +Shift left by 2 and add + +Mnemonic:: +sh2add _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x4, attr: ['SH2ADD'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x10, attr: ['SH2ADD'] }, +]} +.... + +Description:: +This instruction shifts _rs1_ to the left by 2 places and adds it to _rs2_. + +Operation:: +[source,sail] +-- +X(rd) = X(rs2) + (X(rs1) << 2); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zba (<<#zba>>) +|0.93 +|Frozen +|=== + diff --git a/src/insns/sh2add_uw.adoc b/src/insns/sh2add_uw.adoc new file mode 100644 index 0000000..44a9ade --- /dev/null +++ b/src/insns/sh2add_uw.adoc @@ -0,0 +1,48 @@ +[#insns-sh2add_uw,reftext=Shift unsigned word left by 2 and add] +==== sh2add.uw + +Synopsis:: +Shift unsigned word left by 2 and add + +Mnemonic:: +sh2add.uw _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x3b, attr: ['OP-32'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x4, attr: ['SH2ADD.UW'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x10, attr: ['SH2ADD.UW'] }, +]} +.... + +Description:: +This instruction performs an XLEN-wide addition of two addends. +The first addend is _rs2_. +The second addend is the unsigned value formed by extracting the least-significant word of _rs1_ and shifting it left by 2 places. + +Operation:: +[source,sail] +-- +let base = X(rs2); +let index = EXTZ(X(rs1)[31..0]); + +X(rd) = base + (index << 2); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zba (<<#zba>>) +|0.93 +|Frozen +|=== + diff --git a/src/insns/sh3add.adoc b/src/insns/sh3add.adoc new file mode 100644 index 0000000..2ebc08b --- /dev/null +++ b/src/insns/sh3add.adoc @@ -0,0 +1,42 @@ +[#insns-sh3add,reftext=Shift left by 3 and add] +==== sh3add + +Synopsis:: +Shift left by 3 and add + +Mnemonic:: +sh3add _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x6, attr: ['SH3ADD'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x10, attr: ['SH3ADD'] }, +]} +.... + +Description:: +This instruction shifts _rs1_ to the left by 3 places and adds it to _rs2_. + +Operation:: +[source,sail] +-- +X(rd) = X(rs2) + (X(rs1) << 3); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zba (<<#zba>>) +|0.93 +|Frozen +|=== diff --git a/src/insns/sh3add_uw.adoc b/src/insns/sh3add_uw.adoc new file mode 100644 index 0000000..500c32c --- /dev/null +++ b/src/insns/sh3add_uw.adoc @@ -0,0 +1,45 @@ +[#insns-sh3add_uw,reftext=Shift unsigned word left by 3 and add] +==== sh3add.uw + +Synopsis:: +Shift unsigned word left by 3 and add + +Mnemonic:: +sh3add.uw _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x3b, attr: ['OP-32'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x6, attr: ['SH3ADD.UW'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x10, attr: ['SH3ADD.UW'] }, +]} +.... + +Description:: +This instruction performs an XLEN-wide addition of two addends. The first addend is _rs2_. The second addend is the unsigned value formed by extracting the least-significant word of _rs1_ and shifting it left by 3 places. + +Operation:: +[source,sail] +-- +let base = X(rs2); +let index = EXTZ(X(rs1)[31..0]); + +X(rd) = base + (index << 3); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zba (<<#zba>>) +|0.93 +|Frozen +|=== diff --git a/src/insns/slli_uw.adoc b/src/insns/slli_uw.adoc new file mode 100644 index 0000000..776d33e --- /dev/null +++ b/src/insns/slli_uw.adoc @@ -0,0 +1,51 @@ +[#insns-slli_uw,reftext="Shift-left unsigned word (Immediate)"] +==== slli.uw + +Synopsis:: +Shift-left unsigned word (Immediate) + +Mnemonic:: +slli.uw _rd_, _rs1_, _shamt_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x1b, attr: ['OP-IMM-32'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['SLLI.UW'] }, + { bits: 5, name: 'rs1' }, + { bits: 6, name: 'shamt' }, + { bits: 6, name: 0x02, attr: ['SLLI.UW'] }, +]} +.... + +Description:: +This instruction takes the least-significant word of _rs1_, zero-extends it, and shifts it left by the immediate. + +Operation:: +[source,sail] +-- +X(rd) = (EXTZ(X(rs)[31..0]) << shamt); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zba (<<#zba>>) +|0.93 +|Frozen +|=== + + +.Architecture Explanation +[NOTE, caption="A" ] +=============================================================== +This instruction is the same as *slli* with *zext.w* performed on _rs1_ before shifting. +=============================================================== + + diff --git a/src/insns/unzip.adoc b/src/insns/unzip.adoc new file mode 100644 index 0000000..c1d3644 --- /dev/null +++ b/src/insns/unzip.adoc @@ -0,0 +1,60 @@ +[#insns-unzip,reftext="Bit deinterleave"] +==== unzip + +Synopsis:: +Implements the inverse of the zip instruction. + +Mnemonic:: +unzip _rd_, _rs_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ +{bits: 2, name: 0x3}, +{bits: 5, name: 0x4}, +{bits: 5, name: 'rd'}, +{bits: 3, name: 0x5}, +{bits: 5, name: 'rs1'}, +{bits: 5, name: 0x1f}, +{bits: 7, name: 0x4}, +]} +.... + +Description:: +This instruction gathers bits from the high and low halves of the source +word into odd/even bit positions in the destination word. +It is the inverse of the <> instruction. +This instruction is available only on RV32. + +Operation:: +[source,sail] +-- +foreach (i from 0 to xlen/2-1) { + X(rd)[i] = X(rs1)[2*i] + X(rd)[i+xlen/2] = X(rs1)[2*i+1] +} +-- + +.Software Hint +[NOTE, caption="SH" ] +=============================================================== +This instruction is useful for implementing the SHA3 cryptographic +hash function on a 32-bit architecture, as it implements the +bit-interleaving operation used to speed up the 64-bit rotations +directly. +=============================================================== + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbkb (<<#zbkb>>) (RV32) +|v0.9.4 +|Frozen +|=== + + diff --git a/src/insns/xnor.adoc b/src/insns/xnor.adoc new file mode 100644 index 0000000..63099e0 --- /dev/null +++ b/src/insns/xnor.adoc @@ -0,0 +1,47 @@ +[#insns-xnor,reftext="Exclusive NOR"] +==== xnor + +Synopsis:: +Exclusive NOR + +Mnemonic:: +xnor _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x4, attr: ['XNOR']}, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x20, attr: ['XNOR'] }, +]} +.... + +Description:: +This instruction performs the bit-wise exclusive-NOR operation on _rs1_ and _rs2_. + +Operation:: +[source,sail] +-- +X(rd) = ~(X(rs1) ^ X(rs2)); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen + +|Zbkb (<<#zbkb>>) +|v0.9.4 +|Frozen +|=== + diff --git a/src/insns/xpermb.adoc b/src/insns/xpermb.adoc new file mode 100644 index 0000000..73298f8 --- /dev/null +++ b/src/insns/xpermb.adoc @@ -0,0 +1,60 @@ +[#insns-xpermb,reftext="Crossbar permutation (bytes)"] +==== xperm.b + +Synopsis:: +Byte-wise lookup of indices into a vector in registers. + +Mnemonic:: +xperm.b _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ +{bits: 2, name: 0x3}, +{bits: 5, name: 0xc}, +{bits: 5, name: 'rd'}, +{bits: 3, name: 0x4}, +{bits: 5, name: 'rs1'}, +{bits: 5, name: 'rs2'}, +{bits: 7, name: 0x14}, +]} +.... + +Description:: +The xperm.b instruction operates on bytes. +The _rs1_ register contains a vector of XLEN/8 8-bit elements. +The _rs2_ register contains a vector of XLEN/8 8-bit indexes. +The result is each element in _rs2_ replaced by the indexed element in _rs1_, +or zero if the index into _rs2_ is out of bounds. + +Operation:: +[source,sail] +-- +val xpermb_lookup : (bits(8), xlenbits) -> bits(8) +function xpermb_lookup (idx, lut) = { + (lut >> (idx @ 0b000))[7..0] +} + +function clause execute ( XPERM_B (rs2,rs1,rd)) = { + result : xlenbits = EXTZ(0b0); + foreach(i from 0 to xlen by 8) { + result[i+7..i] = xpermn_lookup(X(rs2)[i+7..i], X(rs1)); + }; + X(rd) = result; + RETIRE_SUCCESS +} +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbkx (<<#zbkx>>) +|v0.9.4 +|Frozen +|=== + diff --git a/src/insns/xpermn.adoc b/src/insns/xpermn.adoc new file mode 100644 index 0000000..22d9c19 --- /dev/null +++ b/src/insns/xpermn.adoc @@ -0,0 +1,60 @@ +[#insns-xpermn,reftext="Crossbar permutation (nibbles)"] +==== xperm.n + +Synopsis:: +Nibble-wise lookup of indices into a vector. + +Mnemonic:: +xperm.n _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ +{bits: 2, name: 0x3}, +{bits: 5, name: 0xc}, +{bits: 5, name: 'rd'}, +{bits: 3, name: 0x2}, +{bits: 5, name: 'rs1'}, +{bits: 5, name: 'rs2'}, +{bits: 7, name: 0x14}, +]} +.... + +Description:: +The xperm.n instruction operates on nibbles. +The _rs1_ register contains a vector of XLEN/4 4-bit elements. +The _rs2_ register contains a vector of XLEN/4 4-bit indexes. +The result is each element in _rs2_ replaced by the indexed element in _rs1_, +or zero if the index into _rs2_ is out of bounds. + +Operation:: +[source,sail] +-- +val xpermn_lookup : (bits(4), xlenbits) -> bits(4) +function xpermn_lookup (idx, lut) = { + (lut >> (idx @ 0b00))[3..0] +} + +function clause execute ( XPERM_N (rs2,rs1,rd)) = { + result : xlenbits = EXTZ(0b0); + foreach(i from 0 to xlen by 4) { + result[i+3..i] = xpermn_lookup(X(rs2)[i+3..i], X(rs1)); + }; + X(rd) = result; + RETIRE_SUCCESS +} +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbkx (<<#zbkx>>) +|v0.9.4 +|Frozen +|=== + diff --git a/src/insns/zext_h.adoc b/src/insns/zext_h.adoc new file mode 100644 index 0000000..cae2105 --- /dev/null +++ b/src/insns/zext_h.adoc @@ -0,0 +1,61 @@ +[#insns-zext_h,reftext="Zero-extend halfword"] +==== zext.h + +Synopsis:: +Zero-extend halfword + +Mnemonic:: +zext.h _rd_, _rs_ + +Encoding (RV32):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x4, attr: ['ZEXT.H']}, + { bits: 5, name: 'rs' }, + { bits: 5, name: 0x00 }, + { bits: 7, name: 0x04 }, +]} +.... + +Encoding (RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x3b, attr: ['OP-32'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x4, attr: ['ZEXT.H']}, + { bits: 5, name: 'rs' }, + { bits: 5, name: 0x00 }, + { bits: 7, name: 0x04 }, +]} +.... + +Description:: +This instruction zero-extends the least-significant halfword of the source to XLEN by inserting 0's into all of the bits more significant than 15. + +Operation:: +[source,sail] +-- +X(rd) = EXTZ(X(rs)[15..0]); +-- + +.Note +[NOTE, caption="A" ] +=============================================================== +The *zext.h* mnemonic corresponds to different instruction encodings in RV32 and RV64. +=============================================================== + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen +|=== diff --git a/src/insns/zip.adoc b/src/insns/zip.adoc new file mode 100644 index 0000000..fcb5860 --- /dev/null +++ b/src/insns/zip.adoc @@ -0,0 +1,60 @@ +[#insns-zip,reftext="Bit interleave"] +==== zip + +Synopsis:: +Gather odd and even bits of the source word into upper/lower halves of the +destination. + +Mnemonic:: +zip _rd_, _rs_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ +{bits: 2, name: 0x3}, +{bits: 5, name: 0x4}, +{bits: 5, name: 'rd'}, +{bits: 3, name: 0x1}, +{bits: 5, name: 'rs1'}, +{bits: 5, name: 0x1e}, +{bits: 7, name: 0x4}, +]} +.... + +Description:: +This instruction scatters all of the odd and even bits of a source word into +the high and low halves of a destination word. +It is the inverse of the <> instruction. +This instruction is available only on RV32. + +Operation:: +[source,sail] +-- +foreach (i from 0 to xlen/2-1) { + X(rd)[2*i] = X(rs1)[i] + X(rd)[2*i+1] = X(rs1)[i+xlen/2] +} +-- + +.Software Hint +[NOTE, caption="SH" ] +=============================================================== +This instruction is useful for implementing the SHA3 cryptographic +hash function on a 32-bit architecture, as it implements the +bit-interleaving operation used to speed up the 64-bit rotations +directly. +=============================================================== + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbkb (<<#zbkb>>) (RV32) +|v0.9.4 +|Frozen +|=== + -- cgit v1.1 From d30efc9b8ca0ebd57827cf3f60c1491043d35a61 Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Thu, 3 Aug 2023 12:06:55 -0400 Subject: Adding software optimization guide to bitmanip Adding the software optimization guide to bitmanip chapter. Note that in the original document this was included as an appendix. --- src/b-st-ext.adoc | 145 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 144 insertions(+), 1 deletion(-) diff --git a/src/b-st-ext.adoc b/src/b-st-ext.adoc index de8317e..f8e3999 100644 --- a/src/b-st-ext.adoc +++ b/src/b-st-ext.adoc @@ -1137,4 +1137,147 @@ include::insns/xpermn.adoc[] <<< include::insns/zext_h.adoc[] <<< -include::insns/zip.adoc[] \ No newline at end of file +include::insns/zip.adoc[] + +=== Software optimization guide + +==== strlen + +The *orc.b* instruction allows for the efficient detection of *NUL* bytes in an XLEN-sized chunk of data: + + * the result of *orc.b* on a chunk that does not contain any *NUL* bytes will be all-ones, and + * after a bitwise-negation of the result of *orc.b*, the number of data bytes before the first *NUL* byte (if any) can be detected by *ctz*/*clz* (depending on the endianness of data). + +A full example of a *strlen* function, which uses these techniques and also demonstrates the use of it for unaligned/partial data, is the following: + +[source,asm] +-- +#include + + .text + .globl strlen + .type strlen, @function +strlen: + andi a3, a0, (SZREG-1) // offset + andi a1, a0, -SZREG // align pointer +.Lprologue: + li a4, SZREG + sub a4, a4, a3 // XLEN - offset + slli a3, a3, PTRLOG // offset * 8 + REG_L a2, 0(a1) // chunk + /* + * Shift the partial/unaligned chunk we loaded to remove the bytes + * from before the start of the string, adding NUL bytes at the end. + */ +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ + srl a2, a2 ,a3 // chunk >> (offset * 8) +#else + sll a2, a2, a3 +#endif + orc.b a2, a2 + not a2, a2 + /* + * Non-NUL bytes in the string have been expanded to 0x00, while + * NUL bytes have become 0xff. Search for the first set bit + * (corresponding to a NUL byte in the original chunk). + */ +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ + ctz a2, a2 +#else + clz a2, a2 +#endif + /* + * The first chunk is special: compare against the number of valid + * bytes in this chunk. + */ + srli a0, a2, 3 + bgtu a4, a0, .Ldone + addi a3, a1, SZREG + li a4, -1 + .align 2 + /* + * Our critical loop is 4 instructions and processes data in 4 byte + * or 8 byte chunks. + */ +.Lloop: + REG_L a2, SZREG(a1) + addi a1, a1, SZREG + orc.b a2, a2 + beq a2, a4, .Lloop + +.Lepilogue: + not a2, a2 +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ + ctz a2, a2 +#else + clz a2, a2 +#endif + sub a1, a1, a3 + add a0, a0, a1 + srli a2, a2, 3 + add a0, a0, a2 +.Ldone: + ret +-- + +==== strcmp + +[source,asm] +-- +#include + + .text + .globl strcmp + .type strcmp, @function +strcmp: + or a4, a0, a1 + li t2, -1 + and a4, a4, SZREG-1 + bnez a4, .Lsimpleloop + + # Main loop for aligned strings +.Lloop: + REG_L a2, 0(a0) + REG_L a3, 0(a1) + orc.b t0, a2 + bne t0, t2, .Lfoundnull + addi a0, a0, SZREG + addi a1, a1, SZREG + beq a2, a3, .Lloop + + # Words don't match, and no null byte in first word. + # Get bytes in big-endian order and compare. +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ + rev8 a2, a2 + rev8 a3, a3 +#endif + # Synthesize (a2 >= a3) ? 1 : -1 in a branchless sequence. + sltu a0, a2, a3 + neg a0, a0 + ori a0, a0, 1 + ret + +.Lfoundnull: + # Found a null byte. + # If words don't match, fall back to simple loop. + bne a2, a3, .Lsimpleloop + + # Otherwise, strings are equal. + li a0, 0 + ret + + # Simple loop for misaligned strings +.Lsimpleloop: + lbu a2, 0(a0) + lbu a3, 0(a1) + addi a0, a0, 1 + addi a1, a1, 1 + bne a2, a3, 1f + bnez a2, .Lsimpleloop + +1: + sub a0, a2, a3 + ret + +.size strcmp, .-strcmp +-- \ No newline at end of file -- cgit v1.1 From 7b897205caabbea23dec199f675e6d44f3c74e66 Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Fri, 4 Aug 2023 11:18:45 -0400 Subject: Fixed vfrsqrt7 table. Due to a bug in prawn-pdf I had to add cells to the table to hold the 0 and 1 in each row vs spanning 64 rows each. --- src/images/wavedrom/vfrsqrt7.adoc | 259 +++++++++++++++++++------------------- 1 file changed, 129 insertions(+), 130 deletions(-) diff --git a/src/images/wavedrom/vfrsqrt7.adoc b/src/images/wavedrom/vfrsqrt7.adoc index 10e2958..8ebc621 100644 --- a/src/images/wavedrom/vfrsqrt7.adoc +++ b/src/images/wavedrom/vfrsqrt7.adoc @@ -1,138 +1,137 @@ .vfrsqrt7.v common-case lookup table contents -[%autowidth,float=center,align=center,cols="<,<,<",options="header"] +[%autowidth,float=center,align=center,options="header"] |=== |exp[0] | sig[MSB -: 6] | sig_out[MSB -: 7] -.64+|0| 0 | 52 -| 1 | 51 -| 2 | 50 -| 3 | 48 -| 4 | 47 -| 5 | 46 -| 6 | 44 -| 7 | 43 -| 8 | 42 -| 9 | 41 -| 10 | 40 -| 11 | 39 -| 12 | 38 -| 13 | 36 -| 14 | 35 -| 15 | 34 -| 16 | 33 -| 17 | 32 -| 18 | 31 -| 19 | 30 -| 20 | 30 -| 21 | 29 -| 22 | 28 -| 23 | 27 -| 24 | 26 -| 25 | 25 -| 26 | 24 -| 27 | 23 -| 28 | 23 -| 29 | 22 -| 30 | 21 -| 31 | 20 -| 32 | 19 -| 33 | 19 -| 34 | 18 -| 35 | 17 -| 36 | 16 -| 37 | 16 -| 38 | 15 -| 39 | 14 -| 40 | 14 -| 41 | 13 -| 42 | 12 -| 43 | 12 -| 44 | 11 -| 45 | 10 -| 46 | 10 -| 47 | 9 -| 48 | 9 -| 49 | 8 -| 50 | 7 -| 51 | 7 -| 52 | 6 -| 53 | 6 -| 54 | 5 -| 55 | 4 -| 56 | 4 -| 57 | 3 -| 58 | 3 -| 59 | 2 -| 60 | 2 -| 61 | 1 -| 62 | 1 -| 63 | 0 +| 0| 0 | 52 +| 0| 1 | 51 +| 0| 2 | 50 +| 0| 3 | 48 +| 0| 4 | 47 +| 0| 5 | 46 +| 0| 6 | 44 +| 0| 7 | 43 +| 0| 8 | 42 +| 0| 9 | 41 +| 0| 10 | 40 +| 0| 11 | 39 +| 0| 12 | 38 +| 0| 13 | 36 +| 0| 14 | 35 +| 0| 15 | 34 +| 0| 16 | 33 +| 0| 17 | 32 +| 0| 18 | 31 +| 0| 19 | 30 +| 0| 20 | 30 +| 0| 21 | 29 +| 0| 22 | 28 +| 0| 23 | 27 +| 0| 24 | 26 +| 0| 25 | 25 +| 0| 26 | 24 +| 0| 27 | 23 +| 0| 28 | 23 +| 0| 29 | 22 +| 0| 30 | 21 +| 0| 31 | 20 +| 0| 32 | 19 +| 0| 33 | 19 +| 0| 34 | 18 +| 0| 35 | 17 +| 0| 36 | 16 +| 0| 37 | 16 +| 0| 38 | 15 +| 0| 39 | 14 +| 0| 40 | 14 +| 0| 41 | 13 +| 0| 42 | 12 +| 0| 43 | 12 +| 0| 44 | 11 +| 0| 45 | 10 +| 0| 46 | 10 +| 0| 47 | 9 +| 0| 48 | 9 +| 0| 49 | 8 +| 0| 50 | 7 +| 0| 51 | 7 +| 0| 52 | 6 +| 0| 53 | 6 +| 0| 54 | 5 +| 0| 55 | 4 +| 0| 56 | 4 +| 0| 57 | 3 +| 0| 58 | 3 +| 0| 59 | 2 +| 0| 60 | 2 +| 0| 61 | 1 +| 0| 62 | 1 +| 0| 63 | 0 -.64+|1 -| 0 | 127 -| 1 | 125 -| 2 | 123 -| 3 | 121 -| 4 | 119 -| 5 | 118 -| 6 | 116 -| 7 | 114 -| 8 | 113 -| 9 | 111 -| 10 | 109 -| 11 | 108 -| 12 | 106 -| 13 | 105 -| 14 | 103 -| 15 | 102 -| 16 | 100 -| 17 | 99 -| 18 | 97 -| 19 | 96 -| 20 | 95 -| 21 | 93 -| 22 | 92 -| 23 | 91 -| 24 | 90 -| 25 | 88 -| 26 | 87 -| 27 | 86 -| 28 | 85 -| 29 | 84 -| 30 | 83 -| 31 | 82 -| 32 | 80 -| 33 | 79 -| 34 | 78 -| 35 | 77 -| 36 | 76 -| 37 | 75 -| 38 | 74 -| 39 | 73 -| 40 | 72 -| 41 | 71 -| 42 | 70 -| 43 | 70 -| 44 | 69 -| 45 | 68 -| 46 | 67 -| 47 | 66 -| 48 | 65 -| 49 | 64 -| 50 | 63 -| 51 | 63 -| 52 | 62 -| 53 | 61 -| 54 | 60 -| 55 | 59 -| 56 | 59 -| 57 | 58 -| 58 | 57 -| 59 | 56 -| 60 | 56 -| 61 | 55 -| 62 | 54 -| 63 | 53 +| 1| 0 | 127 +| 1| 1 | 125 +| 1| 2 | 123 +| 1| 3 | 121 +| 1| 4 | 119 +| 1| 5 | 118 +| 1| 6 | 116 +| 1| 7 | 114 +| 1| 8 | 113 +| 1| 9 | 111 +| 1| 10 | 109 +| 1| 11 | 108 +| 1| 12 | 106 +| 1| 13 | 105 +| 1| 14 | 103 +| 1| 15 | 102 +| 1| 16 | 100 +| 1| 17 | 99 +| 1| 18 | 97 +| 1| 19 | 96 +| 1| 20 | 95 +| 1| 21 | 93 +| 1| 22 | 92 +| 1| 23 | 91 +| 1| 24 | 90 +| 1| 25 | 88 +| 1| 26 | 87 +| 1| 27 | 86 +| 1| 28 | 85 +| 1| 29 | 84 +| 1| 30 | 83 +| 1| 31 | 82 +| 1| 32 | 80 +| 1| 33 | 79 +| 1| 34 | 78 +| 1| 35 | 77 +| 1| 36 | 76 +| 1| 37 | 75 +| 1| 38 | 74 +| 1| 39 | 73 +| 1| 40 | 72 +| 1| 41 | 71 +| 1| 42 | 70 +| 1| 43 | 70 +| 1| 44 | 69 +| 1| 45 | 68 +| 1| 46 | 67 +| 1| 47 | 66 +| 1| 48 | 65 +| 1| 49 | 64 +| 1| 50 | 63 +| 1| 51 | 63 +| 1| 52 | 62 +| 1| 53 | 61 +| 1| 54 | 60 +| 1| 55 | 59 +| 1| 56 | 59 +| 1| 57 | 58 +| 1| 58 | 57 +| 1| 59 | 56 +| 1| 60 | 56 +| 1| 61 | 55 +| 1| 62 | 54 +| 1| 63 | 53 |=== \ No newline at end of file -- cgit v1.1 From a9c934e3a9af53f4f2c669d2f2802cd5114469b4 Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Mon, 7 Aug 2023 14:26:12 -0400 Subject: Adding untracked files for vector chapter Adding untracked files for vector chapter --- src/inst-table.adoc | 209 ++++++++++++++++++++++++++++++++++++++++++++++++++ src/valu-format.adoc | 97 +++++++++++++++++++++++ src/vcfg-format.adoc | 44 +++++++++++ src/vfrec7.adoc | 136 ++++++++++++++++++++++++++++++++ src/vfrsqrt7.adoc | 139 +++++++++++++++++++++++++++++++++ src/vmem-format.adoc | 102 ++++++++++++++++++++++++ src/vtype-format.adoc | 27 +++++++ 7 files changed, 754 insertions(+) create mode 100644 src/inst-table.adoc create mode 100644 src/valu-format.adoc create mode 100644 src/vcfg-format.adoc create mode 100644 src/vfrec7.adoc create mode 100644 src/vfrsqrt7.adoc create mode 100644 src/vmem-format.adoc create mode 100644 src/vtype-format.adoc diff --git a/src/inst-table.adoc b/src/inst-table.adoc new file mode 100644 index 0000000..1c3511b --- /dev/null +++ b/src/inst-table.adoc @@ -0,0 +1,209 @@ + +// [cols="4,1,1,1,8,4,1,1,8,4,1,1,8"] +|=== +5+| Integer 4+| Integer 4+| FP + +| funct3 | | | | | funct3 | | | | funct3 | | | +| OPIVV |V| | | | OPMVV |V| | | OPFVV |V| | +| OPIVX | |X| | | OPMVX | |X| | OPFVF | |F| +| OPIVI | | |I| | | | | | | | | +|=== + +// [cols="4,1,1,1,8,4,1,1,8,4,1,1,8"] +|=== +5+| funct6 4+| funct6 4+| funct6 + +| 000000 |V|X|I| vadd | 000000 |V| | vredsum | 000000 |V|F| vfadd +| 000001 | | | | | 000001 |V| | vredand | 000001 |V| | vfredusum +| 000010 |V|X| | vsub | 000010 |V| | vredor | 000010 |V|F| vfsub +| 000011 | |X|I| vrsub | 000011 |V| | vredxor | 000011 |V| | vfredosum +| 000100 |V|X| | vminu | 000100 |V| | vredminu | 000100 |V|F| vfmin +| 000101 |V|X| | vmin | 000101 |V| | vredmin | 000101 |V| | vfredmin +| 000110 |V|X| | vmaxu | 000110 |V| | vredmaxu | 000110 |V|F| vfmax +| 000111 |V|X| | vmax | 000111 |V| | vredmax | 000111 |V| | vfredmax +| 001000 | | | | | 001000 |V|X| vaaddu | 001000 |V|F| vfsgnj +| 001001 |V|X|I| vand | 001001 |V|X| vaadd | 001001 |V|F| vfsgnjn +| 001010 |V|X|I| vor | 001010 |V|X| vasubu | 001010 |V|F| vfsgnjx +| 001011 |V|X|I| vxor | 001011 |V|X| vasub | 001011 | | | +| 001100 |V|X|I| vrgather | 001100 | | | | 001100 | | | +| 001101 | | | | | 001101 | | | | 001101 | | | +| 001110 | |X|I| vslideup | 001110 | |X| vslide1up | 001110 | |F| vfslide1up +| 001110 |V| | |vrgatherei16| | | | | | | | +| 001111 | |X|I| vslidedown | 001111 | |X| vslide1down | 001111 | |F| vfslide1down +|=== + +// [cols="4,1,1,1,8,4,1,1,8,4,1,1,8"] +|=== +5+| funct6 4+| funct6 4+| funct6 + +| 010000 |V|X|I| vadc | 010000 |V| | VWXUNARY0 | 010000 |V| | VWFUNARY0 +| | | | | | 010000 | |X| VRXUNARY0 | 010000 | |F| VRFUNARY0 +| 010001 |V|X|I| vmadc | 010001 | | | | 010001 | | | +| 010010 |V|X| | vsbc | 010010 |V| | VXUNARY0 | 010010 |V| | VFUNARY0 +| 010011 |V|X| | vmsbc | 010011 | | | | 010011 |V| | VFUNARY1 +| 010100 | | | | | 010100 |V| | VMUNARY0 | 010100 | | | +| 010101 | | | | | 010101 | | | | 010101 | | | +| 010110 | | | | | 010110 | | | | 010110 | | | +| 010111 |V|X|I| vmerge/vmv | 010111 |V| | vcompress | 010111 | |F| vfmerge/vfmv +| 011000 |V|X|I| vmseq | 011000 |V| | vmandn | 011000 |V|F| vmfeq +| 011001 |V|X|I| vmsne | 011001 |V| | vmand | 011001 |V|F| vmfle +| 011010 |V|X| | vmsltu | 011010 |V| | vmor | 011010 | | | +| 011011 |V|X| | vmslt | 011011 |V| | vmxor | 011011 |V|F| vmflt +| 011100 |V|X|I| vmsleu | 011100 |V| | vmorn | 011100 |V|F| vmfne +| 011101 |V|X|I| vmsle | 011101 |V| | vmnand | 011101 | |F| vmfgt +| 011110 | |X|I| vmsgtu | 011110 |V| | vmnor | 011110 | | | +| 011111 | |X|I| vmsgt | 011111 |V| | vmxnor | 011111 | |F| vmfge +|=== + +// [cols="4,1,1,1,8,4,1,1,8,4,1,1,8"] +|=== +5+| funct6 4+| funct6 4+| funct6 + +| 100000 |V|X|I| vsaddu | 100000 |V|X| vdivu | 100000 |V|F| vfdiv +| 100001 |V|X|I| vsadd | 100001 |V|X| vdiv | 100001 | |F| vfrdiv +| 100010 |V|X| | vssubu | 100010 |V|X| vremu | 100010 | | | +| 100011 |V|X| | vssub | 100011 |V|X| vrem | 100011 | | | +| 100100 | | | | | 100100 |V|X| vmulhu | 100100 |V|F| vfmul +| 100101 |V|X|I| vsll | 100101 |V|X| vmul | 100101 | | | +| 100110 | | | | | 100110 |V|X| vmulhsu | 100110 | | | +| 100111 |V|X| | vsmul | 100111 |V|X| vmulh | 100111 | |F| vfrsub +| 100111 | | |I| vmvr | | | | | | | | +| 101000 |V|X|I| vsrl | 101000 | | | | 101000 |V|F| vfmadd +| 101001 |V|X|I| vsra | 101001 |V|X| vmadd | 101001 |V|F| vfnmadd +| 101010 |V|X|I| vssrl | 101010 | | | | 101010 |V|F| vfmsub +| 101011 |V|X|I| vssra | 101011 |V|X| vnmsub | 101011 |V|F| vfnmsub +| 101100 |V|X|I| vnsrl | 101100 | | | | 101100 |V|F| vfmacc +| 101101 |V|X|I| vnsra | 101101 |V|X| vmacc | 101101 |V|F| vfnmacc +| 101110 |V|X|I| vnclipu | 101110 | | | | 101110 |V|F| vfmsac +| 101111 |V|X|I| vnclip | 101111 |V|X| vnmsac | 101111 |V|F| vfnmsac +|=== + +// [cols="4,1,1,1,8,4,1,1,8,4,1,1,8"] +|=== +5+| funct6 4+| funct6 4+| funct6 + +| 110000 |V| | | vwredsumu | 110000 |V|X| vwaddu | 110000 |V|F| vfwadd +| 110001 |V| | | vwredsum | 110001 |V|X| vwadd | 110001 |V| | vfwredusum +| 110010 | | | | | 110010 |V|X| vwsubu | 110010 |V|F| vfwsub +| 110011 | | | | | 110011 |V|X| vwsub | 110011 |V| | vfwredosum +| 110100 | | | | | 110100 |V|X| vwaddu.w | 110100 |V|F| vfwadd.w +| 110101 | | | | | 110101 |V|X| vwadd.w | 110101 | | | +| 110110 | | | | | 110110 |V|X| vwsubu.w | 110110 |V|F| vfwsub.w +| 110111 | | | | | 110111 |V|X| vwsub.w | 110111 | | | +| 111000 | | | | | 111000 |V|X| vwmulu | 111000 |V|F| vfwmul +| 111001 | | | | | 111001 | | | | 111001 | | | +| 111010 | | | | | 111010 |V|X| vwmulsu | 111010 | | | +| 111011 | | | | | 111011 |V|X| vwmul | 111011 | | | +| 111100 | | | | | 111100 |V|X| vwmaccu | 111100 |V|F| vfwmacc +| 111101 | | | | | 111101 |V|X| vwmacc | 111101 |V|F| vfwnmacc +| 111110 | | | | | 111110 | |X| vwmaccus | 111110 |V|F| vfwmsac +| 111111 | | | | | 111111 |V|X| vwmaccsu | 111111 |V|F| vfwnmsac +|=== + +<<< + +.VRXUNARY0 encoding space +[cols="2,14"] +|=== +| vs2 | + +| 00000 | vmv.s.x +|=== + +.VWXUNARY0 encoding space +[cols="2,14"] +|=== +| vs1 | + +| 00000 | vmv.x.s +| 10000 | vcpop +| 10001 | vfirst +|=== + +.VXUNARY0 encoding space +[cols="2,14"] +|=== +| vs1 | + +| 00010 | vzext.vf8 +| 00011 | vsext.vf8 +| 00100 | vzext.vf4 +| 00101 | vsext.vf4 +| 00110 | vzext.vf2 +| 00111 | vsext.vf2 +|=== + +.VRFUNARY0 encoding space +[cols="2,14"] +|=== +| vs2 | + +| 00000 | vfmv.s.f +|=== + +.VWFUNARY0 encoding space +[cols="2,14"] +|=== +| vs1 | + +| 00000 | vfmv.f.s +|=== + +.VFUNARY0 encoding space +[cols="2,14"] +|=== +| vs1 | name + +2+| single-width converts +| 00000 | vfcvt.xu.f.v +| 00001 | vfcvt.x.f.v +| 00010 | vfcvt.f.xu.v +| 00011 | vfcvt.f.x.v +| 00110 | vfcvt.rtz.xu.f.v +| 00111 | vfcvt.rtz.x.f.v +| | +2+| widening converts +| 01000 | vfwcvt.xu.f.v +| 01001 | vfwcvt.x.f.v +| 01010 | vfwcvt.f.xu.v +| 01011 | vfwcvt.f.x.v +| 01100 | vfwcvt.f.f.v +| 01110 | vfwcvt.rtz.xu.f.v +| 01111 | vfwcvt.rtz.x.f.v +| | +2+| narrowing converts +| 10000 | vfncvt.xu.f.w +| 10001 | vfncvt.x.f.w +| 10010 | vfncvt.f.xu.w +| 10011 | vfncvt.f.x.w +| 10100 | vfncvt.f.f.w +| 10101 | vfncvt.rod.f.f.w +| 10110 | vfncvt.rtz.xu.f.w +| 10111 | vfncvt.rtz.x.f.w +|=== + +.VFUNARY1 encoding space +[cols="2,14"] +|=== +| vs1 | name + +| 00000 | vfsqrt.v +| 00100 | vfrsqrt7.v +| 00101 | vfrec7.v +| 10000 | vfclass.v +|=== + + +.VMUNARY0 encoding space +[cols="2,14"] +|=== +| vs1 | + +| 00001 | vmsbf +| 00010 | vmsof +| 00011 | vmsif +| 10000 | viota +| 10001 | vid +|=== + + diff --git a/src/valu-format.adoc b/src/valu-format.adoc new file mode 100644 index 0000000..c6f6f52 --- /dev/null +++ b/src/valu-format.adoc @@ -0,0 +1,97 @@ +Formats for Vector Arithmetic Instructions under OP-V major opcode + +//// +31 26 25 24 20 19 15 14 12 11 7 6 0 + funct6 | vm | vs2 | vs1 | 0 0 0 | vd |1010111| OP-V (OPIVV) + funct6 | vm | vs2 | vs1 | 0 0 1 | vd/rd |1010111| OP-V (OPFVV) + funct6 | vm | vs2 | vs1 | 0 1 0 | vd/rd |1010111| OP-V (OPMVV) + funct6 | vm | vs2 | imm[4:0] | 0 1 1 | vd |1010111| OP-V (OPIVI) + funct6 | vm | vs2 | rs1 | 1 0 0 | vd |1010111| OP-V (OPIVX) + funct6 | vm | vs2 | rs1 | 1 0 1 | vd |1010111| OP-V (OPFVF) + funct6 | vm | vs2 | rs1 | 1 1 0 | vd/rd |1010111| OP-V (OPMVX) + 6 1 5 5 3 5 7 +//// + +```wavedrom +{reg: [ + {bits: 7, name: 0x57, attr: 'OPIVV'}, + {bits: 5, name: 'vd', type: 2}, + {bits: 3, name: 0}, + {bits: 5, name: 'vs1', type: 2}, + {bits: 5, name: 'vs2', type: 2}, + {bits: 1, name: 'vm'}, + {bits: 6, name: 'funct6'}, +]} +``` + +```wavedrom +{reg: [ + {bits: 7, name: 0x57, attr: 'OPFVV'}, + {bits: 5, name: 'vd / rd', type: 7}, + {bits: 3, name: 1}, + {bits: 5, name: 'vs1', type: 2}, + {bits: 5, name: 'vs2', type: 2}, + {bits: 1, name: 'vm'}, + {bits: 6, name: 'funct6'}, +]} +``` + +```wavedrom +{reg: [ + {bits: 7, name: 0x57, attr: 'OPMVV'}, + {bits: 5, name: 'vd / rd', type: 7}, + {bits: 3, name: 2}, + {bits: 5, name: 'vs1', type: 2}, + {bits: 5, name: 'vs2', type: 2}, + {bits: 1, name: 'vm'}, + {bits: 6, name: 'funct6'}, +]} +``` + +```wavedrom +{reg: [ + {bits: 7, name: 0x57, attr: ['OPIVI']}, + {bits: 5, name: 'vd', type: 2}, + {bits: 3, name: 3}, + {bits: 5, name: 'imm[4:0]', type: 5}, + {bits: 5, name: 'vs2', type: 2}, + {bits: 1, name: 'vm'}, + {bits: 6, name: 'funct6'}, +]} +``` + +```wavedrom +{reg: [ + {bits: 7, name: 0x57, attr: 'OPIVX'}, + {bits: 5, name: 'vd', type: 2}, + {bits: 3, name: 4}, + {bits: 5, name: 'rs1', type: 4}, + {bits: 5, name: 'vs2', type: 2}, + {bits: 1, name: 'vm'}, + {bits: 6, name: 'funct6'}, +]} +``` + +```wavedrom +{reg: [ + {bits: 7, name: 0x57, attr: 'OPFVF'}, + {bits: 5, name: 'vd', type: 2}, + {bits: 3, name: 5}, + {bits: 5, name: 'rs1', type: 4}, + {bits: 5, name: 'vs2', type: 2}, + {bits: 1, name: 'vm'}, + {bits: 6, name: 'funct6'}, +]} +``` + +```wavedrom +{reg: [ + {bits: 7, name: 0x57, attr: 'OPMVX'}, + {bits: 5, name: 'vd / rd', type: 7}, + {bits: 3, name: 6}, + {bits: 5, name: 'rs1', type: 4}, + {bits: 5, name: 'vs2', type: 2}, + {bits: 1, name: 'vm'}, + {bits: 6, name: 'funct6'}, +]} +``` diff --git a/src/vcfg-format.adoc b/src/vcfg-format.adoc new file mode 100644 index 0000000..f1bb4c0 --- /dev/null +++ b/src/vcfg-format.adoc @@ -0,0 +1,44 @@ +Formats for Vector Configuration Instructions under OP-V major opcode + +//// + 31 30 25 24 20 19 15 14 12 11 7 6 0 + 0 | zimm[10:0] | rs1 | 1 1 1 | rd |1010111| vsetvli + 1 | 1| zimm[ 9:0] | uimm[4:0]| 1 1 1 | rd |1010111| vsetivli + 1 | 000000 | rs2 | rs1 | 1 1 1 | rd |1010111| vsetvl + 1 6 5 5 3 5 7 +//// + +```wavedrom +{reg: [ + {bits: 7, name: 0x57, attr: 'vsetvli'}, + {bits: 5, name: 'rd', type: 4}, + {bits: 3, name: 7}, + {bits: 5, name: 'rs1', type: 4}, + {bits: 11, name: 'vtypei[10:0]', type: 5}, + {bits: 1, name: '0'}, +]} +``` + +```wavedrom +{reg: [ + {bits: 7, name: 0x57, attr: 'vsetivli'}, + {bits: 5, name: 'rd', type: 4}, + {bits: 3, name: 7}, + {bits: 5, name: 'uimm[4:0]', type: 5}, + {bits: 10, name: 'vtypei[9:0]', type: 5}, + {bits: 1, name: '1'}, + {bits: 1, name: '1'}, +]} +``` + +```wavedrom +{reg: [ + {bits: 7, name: 0x57, attr: 'vsetvl'}, + {bits: 5, name: 'rd', type: 4}, + {bits: 3, name: 7}, + {bits: 5, name: 'rs1', type: 4}, + {bits: 5, name: 'rs2', type: 4}, + {bits: 6, name: 0x00}, + {bits: 1, name: 1}, +]} +``` diff --git a/src/vfrec7.adoc b/src/vfrec7.adoc new file mode 100644 index 0000000..02abe60 --- /dev/null +++ b/src/vfrec7.adoc @@ -0,0 +1,136 @@ +.vfrec7.v common-case lookup table contents +[%autowidth] +|=== + +| sig[MSB -: 7] | sig_out[MSB -: 7] + +| 0 | 127 +| 1 | 125 +| 2 | 123 +| 3 | 121 +| 4 | 119 +| 5 | 117 +| 6 | 116 +| 7 | 114 +| 8 | 112 +| 9 | 110 +| 10 | 109 +| 11 | 107 +| 12 | 105 +| 13 | 104 +| 14 | 102 +| 15 | 100 +| 16 | 99 +| 17 | 97 +| 18 | 96 +| 19 | 94 +| 20 | 93 +| 21 | 91 +| 22 | 90 +| 23 | 88 +| 24 | 87 +| 25 | 85 +| 26 | 84 +| 27 | 83 +| 28 | 81 +| 29 | 80 +| 30 | 79 +| 31 | 77 +| 32 | 76 +| 33 | 75 +| 34 | 74 +| 35 | 72 +| 36 | 71 +| 37 | 70 +| 38 | 69 +| 39 | 68 +| 40 | 66 +| 41 | 65 +| 42 | 64 +| 43 | 63 +| 44 | 62 +| 45 | 61 +| 46 | 60 +| 47 | 59 +| 48 | 58 +| 49 | 57 +| 50 | 56 +| 51 | 55 +| 52 | 54 +| 53 | 53 +| 54 | 52 +| 55 | 51 +| 56 | 50 +| 57 | 49 +| 58 | 48 +| 59 | 47 +| 60 | 46 +| 61 | 45 +| 62 | 44 +| 63 | 43 +| 64 | 42 +| 65 | 41 +| 66 | 40 +| 67 | 40 +| 68 | 39 +| 69 | 38 +| 70 | 37 +| 71 | 36 +| 72 | 35 +| 73 | 35 +| 74 | 34 +| 75 | 33 +| 76 | 32 +| 77 | 31 +| 78 | 31 +| 79 | 30 +| 80 | 29 +| 81 | 28 +| 82 | 28 +| 83 | 27 +| 84 | 26 +| 85 | 25 +| 86 | 25 +| 87 | 24 +| 88 | 23 +| 89 | 23 +| 90 | 22 +| 91 | 21 +| 92 | 21 +| 93 | 20 +| 94 | 19 +| 95 | 19 +| 96 | 18 +| 97 | 17 +| 98 | 17 +| 99 | 16 +| 100 | 15 +| 101 | 15 +| 102 | 14 +| 103 | 14 +| 104 | 13 +| 105 | 12 +| 106 | 12 +| 107 | 11 +| 108 | 11 +| 109 | 10 +| 110 | 9 +| 111 | 9 +| 112 | 8 +| 113 | 8 +| 114 | 7 +| 115 | 7 +| 116 | 6 +| 117 | 5 +| 118 | 5 +| 119 | 4 +| 120 | 4 +| 121 | 3 +| 122 | 3 +| 123 | 2 +| 124 | 2 +| 125 | 1 +| 126 | 1 +| 127 | 0 + +|=== diff --git a/src/vfrsqrt7.adoc b/src/vfrsqrt7.adoc new file mode 100644 index 0000000..ace8022 --- /dev/null +++ b/src/vfrsqrt7.adoc @@ -0,0 +1,139 @@ +.vfrsqrt7.v common-case lookup table contents +[%autowidth] +|=== + +|exp[0] | sig[MSB -: 6] | sig_out[MSB -: 7] + +.64+|0 +| 0 | 52 +| 1 | 51 +| 2 | 50 +| 3 | 48 +| 4 | 47 +| 5 | 46 +| 6 | 44 +| 7 | 43 +| 8 | 42 +| 9 | 41 +| 10 | 40 +| 11 | 39 +| 12 | 38 +| 13 | 36 +| 14 | 35 +| 15 | 34 +| 16 | 33 +| 17 | 32 +| 18 | 31 +| 19 | 30 +| 20 | 30 +| 21 | 29 +| 22 | 28 +| 23 | 27 +| 24 | 26 +| 25 | 25 +| 26 | 24 +| 27 | 23 +| 28 | 23 +| 29 | 22 +| 30 | 21 +| 31 | 20 +| 32 | 19 +| 33 | 19 +| 34 | 18 +| 35 | 17 +| 36 | 16 +| 37 | 16 +| 38 | 15 +| 39 | 14 +| 40 | 14 +| 41 | 13 +| 42 | 12 +| 43 | 12 +| 44 | 11 +| 45 | 10 +| 46 | 10 +| 47 | 9 +| 48 | 9 +| 49 | 8 +| 50 | 7 +| 51 | 7 +| 52 | 6 +| 53 | 6 +| 54 | 5 +| 55 | 4 +| 56 | 4 +| 57 | 3 +| 58 | 3 +| 59 | 2 +| 60 | 2 +| 61 | 1 +| 62 | 1 +| 63 | 0 + +.64+|1 +| 0 | 127 +| 1 | 125 +| 2 | 123 +| 3 | 121 +| 4 | 119 +| 5 | 118 +| 6 | 116 +| 7 | 114 +| 8 | 113 +| 9 | 111 +| 10 | 109 +| 11 | 108 +| 12 | 106 +| 13 | 105 +| 14 | 103 +| 15 | 102 +| 16 | 100 +| 17 | 99 +| 18 | 97 +| 19 | 96 +| 20 | 95 +| 21 | 93 +| 22 | 92 +| 23 | 91 +| 24 | 90 +| 25 | 88 +| 26 | 87 +| 27 | 86 +| 28 | 85 +| 29 | 84 +| 30 | 83 +| 31 | 82 +| 32 | 80 +| 33 | 79 +| 34 | 78 +| 35 | 77 +| 36 | 76 +| 37 | 75 +| 38 | 74 +| 39 | 73 +| 40 | 72 +| 41 | 71 +| 42 | 70 +| 43 | 70 +| 44 | 69 +| 45 | 68 +| 46 | 67 +| 47 | 66 +| 48 | 65 +| 49 | 64 +| 50 | 63 +| 51 | 63 +| 52 | 62 +| 53 | 61 +| 54 | 60 +| 55 | 59 +| 56 | 59 +| 57 | 58 +| 58 | 57 +| 59 | 56 +| 60 | 56 +| 61 | 55 +| 62 | 54 +| 63 | 53 + +|=== diff --git a/src/vmem-format.adoc b/src/vmem-format.adoc new file mode 100644 index 0000000..3b20043 --- /dev/null +++ b/src/vmem-format.adoc @@ -0,0 +1,102 @@ +Format for Vector Load Instructions under LOAD-FP major opcode + +//// +31 29 28 27 26 25 24 20 19 15 14 12 11 7 6 0 + nf | mew| mop | vm | lumop | rs1 | width | vd |0000111| VL* unit-stride + nf | mew| mop | vm | rs2 | rs1 | width | vd |0000111| VLS* strided + nf | mew| mop | vm | vs2 | rs1 | width | vd |0000111| VLX* indexed + 3 1 2 1 5 5 3 5 7 +//// + +```wavedrom +{reg: [ + {bits: 7, name: 0x7, attr: 'VL* unit-stride'}, + {bits: 5, name: 'vd', attr: 'destination of load', type: 2}, + {bits: 3, name: 'width'}, + {bits: 5, name: 'rs1', attr: 'base address', type: 4}, + {bits: 5, name: 'lumop'}, + {bits: 1, name: 'vm'}, + {bits: 2, name: 'mop'}, + {bits: 1, name: 'mew'}, + {bits: 3, name: 'nf'}, +]} +``` + +```wavedrom +{reg: [ + {bits: 7, name: 0x7, attr: 'VLS* strided'}, + {bits: 5, name: 'vd', attr: 'destination of load', type: 2}, + {bits: 3, name: 'width'}, + {bits: 5, name: 'rs1', attr: 'base address', type: 4}, + {bits: 5, name: 'rs2', attr: 'stride', type: 4}, + {bits: 1, name: 'vm'}, + {bits: 2, name: 'mop'}, + {bits: 1, name: 'mew'}, + {bits: 3, name: 'nf'}, +]} +``` + +```wavedrom +{reg: [ + {bits: 7, name: 0x7, attr: 'VLX* indexed'}, + {bits: 5, name: 'vd', attr: 'destination of load', type: 2}, + {bits: 3, name: 'width'}, + {bits: 5, name: 'rs1', attr: 'base address', type: 4}, + {bits: 5, name: 'vs2', attr: 'address offsets', type: 2}, + {bits: 1, name: 'vm'}, + {bits: 2, name: 'mop'}, + {bits: 1, name: 'mew'}, + {bits: 3, name: 'nf'}, +]} +``` +Format for Vector Store Instructions under STORE-FP major opcode + +//// +31 29 28 27 26 25 24 20 19 15 14 12 11 7 6 0 + nf | mew| mop | vm | sumop | rs1 | width | vs3 |0100111| VS* unit-stride + nf | mew| mop | vm | rs2 | rs1 | width | vs3 |0100111| VSS* strided + nf | mew| mop | vm | vs2 | rs1 | width | vs3 |0100111| VSX* indexed + 3 1 2 1 5 5 3 5 7 +//// + +```wavedrom +{reg: [ + {bits: 7, name: 0x27, attr: 'VS* unit-stride'}, + {bits: 5, name: 'vs3', attr: 'store data', type: 2}, + {bits: 3, name: 'width'}, + {bits: 5, name: 'rs1', attr: 'base address', type: 4}, + {bits: 5, name: 'sumop'}, + {bits: 1, name: 'vm'}, + {bits: 2, name: 'mop'}, + {bits: 1, name: 'mew'}, + {bits: 3, name: 'nf'}, +]} +``` + +```wavedrom +{reg: [ + {bits: 7, name: 0x27, attr: 'VSS* strided'}, + {bits: 5, name: 'vs3', attr: 'store data', type: 2}, + {bits: 3, name: 'width'}, + {bits: 5, name: 'rs1', attr: 'base address', type: 4}, + {bits: 5, name: 'rs2', attr: 'stride', type: 4}, + {bits: 1, name: 'vm'}, + {bits: 2, name: 'mop'}, + {bits: 1, name: 'mew'}, + {bits: 3, name: 'nf'}, +]} +``` + +```wavedrom +{reg: [ + {bits: 7, name: 0x27, attr: 'VSX* indexed'}, + {bits: 5, name: 'vs3', attr: 'store data', type: 2}, + {bits: 3, name: 'width'}, + {bits: 5, name: 'rs1', attr: 'base address', type: 4}, + {bits: 5, name: 'vs2', attr: 'address offsets', type: 2}, + {bits: 1, name: 'vm'}, + {bits: 2, name: 'mop'}, + {bits: 1, name: 'mew'}, + {bits: 3, name: 'nf'}, +]} +``` diff --git a/src/vtype-format.adoc b/src/vtype-format.adoc new file mode 100644 index 0000000..a97af34 --- /dev/null +++ b/src/vtype-format.adoc @@ -0,0 +1,27 @@ +```wavedrom +{reg: [ + {bits: 3, name: 'vlmul[2:0]'}, + {bits: 3, name: 'vsew[2:0]'}, + {bits: 1, name: 'vta'}, + {bits: 1, name: 'vma'}, + {bits: 23, name: 'reserved'}, + {bits: 1, name: 'vill'}, +]} +``` + +NOTE: This diagram shows the layout for RV32 systems, whereas in +general `vill` should be at bit XLEN-1. + +.`vtype` register layout +[cols=">2,4,10"] +[%autowidth] +|=== +| Bits | Name | Description + +| XLEN-1 | vill | Illegal value if set +| XLEN-2:8 | 0 | Reserved if non-zero +| 7 | vma | Vector mask agnostic +| 6 | vta | Vector tail agnostic +| 5:3 | vsew[2:0] | Selected element width (SEW) setting +| 2:0 | vlmul[2:0] | Vector register group multiplier (LMUL) setting +|=== -- cgit v1.1 From 7feae1cc26210518a662ca4f8c879382be83fa97 Mon Sep 17 00:00:00 2001 From: Andrew Waterman Date: Fri, 30 Jun 2023 15:51:35 -0700 Subject: Clarify that vf[red]min/max perform minimumNumber/maximumNumber --- src/v-st-ext.adoc | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/src/v-st-ext.adoc b/src/v-st-ext.adoc index cc3e8b1..81c764c 100644 --- a/src/v-st-ext.adoc +++ b/src/v-st-ext.adoc @@ -3571,7 +3571,8 @@ with greater estimate accuracy. The vector floating-point `vfmin` and `vfmax` instructions have the same behavior as the corresponding scalar floating-point instructions -in version 2.2 of the RISC-V F/D/Q extension. +in version 2.2 of the RISC-V F/D/Q extension: they perform the `minimumNumber` +or `maximumNumber` operation on active elements. ---- # Floating-point minimum @@ -4000,6 +4001,10 @@ NOTE: The `vfredosum` instruction is a valid implementation of the ===== Vector Single-Width Floating-Point Max and Min Reductions +The `vfredmin` and `vfredmax` instructions reduce the scalar argument in +`vs1[0]` and active elements in `vs2` using the `minimumNumber` and +`maximumNumber` operations, respectively. + NOTE: Floating-point max and min reductions should return the same final value and raise the same exception flags regardless of operation order. -- cgit v1.1 From 69717a3f5d775788cd1d37b14667ca89bccc11f3 Mon Sep 17 00:00:00 2001 From: Michael Platzer Date: Tue, 11 Jul 2023 18:12:35 +0200 Subject: Clarify that compares only AND in the mask if vd==v0 Fixes #900 Signed-off-by: Michael Platzer --- src/v-st-ext.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/v-st-ext.adoc b/src/v-st-ext.adoc index 81c764c..1624a14 100644 --- a/src/v-st-ext.adoc +++ b/src/v-st-ext.adoc @@ -2807,7 +2807,7 @@ masked va >= x, any vd not same as vd and which will be clobbered by the pseudoinstruction ---- -Compares effectively AND in the mask under a mask-undisturbed policy e.g, +Compares effectively AND in the mask under a mask-undisturbed policy if the destination register is `v0`, e.g., ---- # (a < b) && (b < c) in two instructions when mask-undisturbed -- cgit v1.1 From 6d417919323268d09b52cdab211582f57b440e84 Mon Sep 17 00:00:00 2001 From: Nick Knight Date: Wed, 2 Aug 2023 14:51:49 -0700 Subject: Clarify Zicsr dependence Signed-off-by: Nick Knight --- src/v-st-ext.adoc | 1 + 1 file changed, 1 insertion(+) diff --git a/src/v-st-ext.adoc b/src/v-st-ext.adoc index 1624a14..ac2eafd 100644 --- a/src/v-st-ext.adoc +++ b/src/v-st-ext.adoc @@ -5068,6 +5068,7 @@ All Zve* extensions support all vector permutation instructions do not include those with floating-point operands, and Zve64f does not include those with EEW=64 floating-point operands. +The Zve32x extension depends on the Zicsr extension. The Zve32f and Zve64f extensions depend upon the F extension, and implement all vector floating-point instructions (Section <>) for -- cgit v1.1 From ebab759b95f0a7c4b5bffba04885d5edc4f0b286 Mon Sep 17 00:00:00 2001 From: Tsukasa OI Date: Thu, 3 Aug 2023 04:35:27 +0000 Subject: Zfinx/Zdinx/Zhinx extensions are ratified This commit removes the word "proposed". Signed-off-by: Tsukasa OI --- src/v-st-ext.adoc | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/v-st-ext.adoc b/src/v-st-ext.adoc index ac2eafd..c83c338 100644 --- a/src/v-st-ext.adoc +++ b/src/v-st-ext.adoc @@ -909,7 +909,7 @@ are written to an `x` or `f` register or to element 0 of a vector register. Any vector register can be used to hold a scalar regardless of the current LMUL setting. -NOTE: Zfinx ("F in X") is a proposed new ISA extension where +NOTE: Zfinx ("F in X") is a new ISA extension where floating-point instructions take their arguments from the integer register file. The vector extension is also compatible with Zfinx, where the Zfinx vector extension has vector-scalar floating-point @@ -922,7 +922,7 @@ high-performance scalar floating-point design, and provides compatibility with the Zfinx ISA option. Overlaying `f` with `v` would provide the advantage of lowering the number of state bits in some implementations, but complicates high-performance designs and -would prevent compatibility with the proposed Zfinx ISA option. +would prevent compatibility with the Zfinx ISA option. [[sec-vec-operands]] ==== Vector Operands @@ -2243,7 +2243,7 @@ type width (which includes when FLEN < SEW) are reserved. NOTE: Some instructions _zero_-extend the 5-bit immediate, and denote this by naming the immediate `uimm` in the assembly syntax. -NOTE: When adding a vector extension to the proposed Zfinx/Zdinx/Zhinx +NOTE: When adding a vector extension to the Zfinx/Zdinx/Zhinx extensions, floating-point scalar arguments are taken from the `x` registers. NaN-boxing is not supported in these extensions, and so the vector floating-point scalar value is produced using the same -- cgit v1.1 From 2368a72717a643f8dceda82d0669e5d1bc4c6fce Mon Sep 17 00:00:00 2001 From: Tsukasa OI Date: Thu, 3 Aug 2023 04:36:52 +0000 Subject: Vector extensions are now ratified Since it passed the public review, this commit removes references to "public review". It doesn't use the word like "ratified" since this is a working version (not exactly a ratified version). Signed-off-by: Tsukasa OI --- src/v-st-ext.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/v-st-ext.adoc b/src/v-st-ext.adoc index c83c338..c1a105f 100644 --- a/src/v-st-ext.adoc +++ b/src/v-st-ext.adoc @@ -4950,8 +4950,8 @@ implementation to support context switching with imprecise traps. [[sec-vector-extensions]] === Standard Vector Extensions -This section describes the standard vector extensions to be proposed -for public review. A set of smaller extensions intended for embedded +This section describes the standard vector extensions. +A set of smaller extensions intended for embedded use are named with a "Zve" prefix, while a larger vector extension designed for application processors is named as a single-letter V extension. A set of vector length extension names with prefix "Zvl" -- cgit v1.1 From 146a8d0fbe52f978e15e1aa15d58f8c4a38b1f61 Mon Sep 17 00:00:00 2001 From: Tsukasa OI Date: Thu, 10 Aug 2023 07:22:07 +0000 Subject: Rename inst-table.adoc to v-inst-table.adoc Because we share the same name space, inst-table.adoc is too generic. This commit renames inst-table.adoc to v-inst-table.adoc to make sure that this is the instruction table for vector. Signed-off-by: Tsukasa OI --- src/images/wavedrom/inst-table.adoc | 209 ---------------------------------- src/images/wavedrom/v-inst-table.adoc | 209 ++++++++++++++++++++++++++++++++++ src/v-st-ext.adoc | 2 +- 3 files changed, 210 insertions(+), 210 deletions(-) delete mode 100644 src/images/wavedrom/inst-table.adoc create mode 100644 src/images/wavedrom/v-inst-table.adoc diff --git a/src/images/wavedrom/inst-table.adoc b/src/images/wavedrom/inst-table.adoc deleted file mode 100644 index 1c3511b..0000000 --- a/src/images/wavedrom/inst-table.adoc +++ /dev/null @@ -1,209 +0,0 @@ - -// [cols="4,1,1,1,8,4,1,1,8,4,1,1,8"] -|=== -5+| Integer 4+| Integer 4+| FP - -| funct3 | | | | | funct3 | | | | funct3 | | | -| OPIVV |V| | | | OPMVV |V| | | OPFVV |V| | -| OPIVX | |X| | | OPMVX | |X| | OPFVF | |F| -| OPIVI | | |I| | | | | | | | | -|=== - -// [cols="4,1,1,1,8,4,1,1,8,4,1,1,8"] -|=== -5+| funct6 4+| funct6 4+| funct6 - -| 000000 |V|X|I| vadd | 000000 |V| | vredsum | 000000 |V|F| vfadd -| 000001 | | | | | 000001 |V| | vredand | 000001 |V| | vfredusum -| 000010 |V|X| | vsub | 000010 |V| | vredor | 000010 |V|F| vfsub -| 000011 | |X|I| vrsub | 000011 |V| | vredxor | 000011 |V| | vfredosum -| 000100 |V|X| | vminu | 000100 |V| | vredminu | 000100 |V|F| vfmin -| 000101 |V|X| | vmin | 000101 |V| | vredmin | 000101 |V| | vfredmin -| 000110 |V|X| | vmaxu | 000110 |V| | vredmaxu | 000110 |V|F| vfmax -| 000111 |V|X| | vmax | 000111 |V| | vredmax | 000111 |V| | vfredmax -| 001000 | | | | | 001000 |V|X| vaaddu | 001000 |V|F| vfsgnj -| 001001 |V|X|I| vand | 001001 |V|X| vaadd | 001001 |V|F| vfsgnjn -| 001010 |V|X|I| vor | 001010 |V|X| vasubu | 001010 |V|F| vfsgnjx -| 001011 |V|X|I| vxor | 001011 |V|X| vasub | 001011 | | | -| 001100 |V|X|I| vrgather | 001100 | | | | 001100 | | | -| 001101 | | | | | 001101 | | | | 001101 | | | -| 001110 | |X|I| vslideup | 001110 | |X| vslide1up | 001110 | |F| vfslide1up -| 001110 |V| | |vrgatherei16| | | | | | | | -| 001111 | |X|I| vslidedown | 001111 | |X| vslide1down | 001111 | |F| vfslide1down -|=== - -// [cols="4,1,1,1,8,4,1,1,8,4,1,1,8"] -|=== -5+| funct6 4+| funct6 4+| funct6 - -| 010000 |V|X|I| vadc | 010000 |V| | VWXUNARY0 | 010000 |V| | VWFUNARY0 -| | | | | | 010000 | |X| VRXUNARY0 | 010000 | |F| VRFUNARY0 -| 010001 |V|X|I| vmadc | 010001 | | | | 010001 | | | -| 010010 |V|X| | vsbc | 010010 |V| | VXUNARY0 | 010010 |V| | VFUNARY0 -| 010011 |V|X| | vmsbc | 010011 | | | | 010011 |V| | VFUNARY1 -| 010100 | | | | | 010100 |V| | VMUNARY0 | 010100 | | | -| 010101 | | | | | 010101 | | | | 010101 | | | -| 010110 | | | | | 010110 | | | | 010110 | | | -| 010111 |V|X|I| vmerge/vmv | 010111 |V| | vcompress | 010111 | |F| vfmerge/vfmv -| 011000 |V|X|I| vmseq | 011000 |V| | vmandn | 011000 |V|F| vmfeq -| 011001 |V|X|I| vmsne | 011001 |V| | vmand | 011001 |V|F| vmfle -| 011010 |V|X| | vmsltu | 011010 |V| | vmor | 011010 | | | -| 011011 |V|X| | vmslt | 011011 |V| | vmxor | 011011 |V|F| vmflt -| 011100 |V|X|I| vmsleu | 011100 |V| | vmorn | 011100 |V|F| vmfne -| 011101 |V|X|I| vmsle | 011101 |V| | vmnand | 011101 | |F| vmfgt -| 011110 | |X|I| vmsgtu | 011110 |V| | vmnor | 011110 | | | -| 011111 | |X|I| vmsgt | 011111 |V| | vmxnor | 011111 | |F| vmfge -|=== - -// [cols="4,1,1,1,8,4,1,1,8,4,1,1,8"] -|=== -5+| funct6 4+| funct6 4+| funct6 - -| 100000 |V|X|I| vsaddu | 100000 |V|X| vdivu | 100000 |V|F| vfdiv -| 100001 |V|X|I| vsadd | 100001 |V|X| vdiv | 100001 | |F| vfrdiv -| 100010 |V|X| | vssubu | 100010 |V|X| vremu | 100010 | | | -| 100011 |V|X| | vssub | 100011 |V|X| vrem | 100011 | | | -| 100100 | | | | | 100100 |V|X| vmulhu | 100100 |V|F| vfmul -| 100101 |V|X|I| vsll | 100101 |V|X| vmul | 100101 | | | -| 100110 | | | | | 100110 |V|X| vmulhsu | 100110 | | | -| 100111 |V|X| | vsmul | 100111 |V|X| vmulh | 100111 | |F| vfrsub -| 100111 | | |I| vmvr | | | | | | | | -| 101000 |V|X|I| vsrl | 101000 | | | | 101000 |V|F| vfmadd -| 101001 |V|X|I| vsra | 101001 |V|X| vmadd | 101001 |V|F| vfnmadd -| 101010 |V|X|I| vssrl | 101010 | | | | 101010 |V|F| vfmsub -| 101011 |V|X|I| vssra | 101011 |V|X| vnmsub | 101011 |V|F| vfnmsub -| 101100 |V|X|I| vnsrl | 101100 | | | | 101100 |V|F| vfmacc -| 101101 |V|X|I| vnsra | 101101 |V|X| vmacc | 101101 |V|F| vfnmacc -| 101110 |V|X|I| vnclipu | 101110 | | | | 101110 |V|F| vfmsac -| 101111 |V|X|I| vnclip | 101111 |V|X| vnmsac | 101111 |V|F| vfnmsac -|=== - -// [cols="4,1,1,1,8,4,1,1,8,4,1,1,8"] -|=== -5+| funct6 4+| funct6 4+| funct6 - -| 110000 |V| | | vwredsumu | 110000 |V|X| vwaddu | 110000 |V|F| vfwadd -| 110001 |V| | | vwredsum | 110001 |V|X| vwadd | 110001 |V| | vfwredusum -| 110010 | | | | | 110010 |V|X| vwsubu | 110010 |V|F| vfwsub -| 110011 | | | | | 110011 |V|X| vwsub | 110011 |V| | vfwredosum -| 110100 | | | | | 110100 |V|X| vwaddu.w | 110100 |V|F| vfwadd.w -| 110101 | | | | | 110101 |V|X| vwadd.w | 110101 | | | -| 110110 | | | | | 110110 |V|X| vwsubu.w | 110110 |V|F| vfwsub.w -| 110111 | | | | | 110111 |V|X| vwsub.w | 110111 | | | -| 111000 | | | | | 111000 |V|X| vwmulu | 111000 |V|F| vfwmul -| 111001 | | | | | 111001 | | | | 111001 | | | -| 111010 | | | | | 111010 |V|X| vwmulsu | 111010 | | | -| 111011 | | | | | 111011 |V|X| vwmul | 111011 | | | -| 111100 | | | | | 111100 |V|X| vwmaccu | 111100 |V|F| vfwmacc -| 111101 | | | | | 111101 |V|X| vwmacc | 111101 |V|F| vfwnmacc -| 111110 | | | | | 111110 | |X| vwmaccus | 111110 |V|F| vfwmsac -| 111111 | | | | | 111111 |V|X| vwmaccsu | 111111 |V|F| vfwnmsac -|=== - -<<< - -.VRXUNARY0 encoding space -[cols="2,14"] -|=== -| vs2 | - -| 00000 | vmv.s.x -|=== - -.VWXUNARY0 encoding space -[cols="2,14"] -|=== -| vs1 | - -| 00000 | vmv.x.s -| 10000 | vcpop -| 10001 | vfirst -|=== - -.VXUNARY0 encoding space -[cols="2,14"] -|=== -| vs1 | - -| 00010 | vzext.vf8 -| 00011 | vsext.vf8 -| 00100 | vzext.vf4 -| 00101 | vsext.vf4 -| 00110 | vzext.vf2 -| 00111 | vsext.vf2 -|=== - -.VRFUNARY0 encoding space -[cols="2,14"] -|=== -| vs2 | - -| 00000 | vfmv.s.f -|=== - -.VWFUNARY0 encoding space -[cols="2,14"] -|=== -| vs1 | - -| 00000 | vfmv.f.s -|=== - -.VFUNARY0 encoding space -[cols="2,14"] -|=== -| vs1 | name - -2+| single-width converts -| 00000 | vfcvt.xu.f.v -| 00001 | vfcvt.x.f.v -| 00010 | vfcvt.f.xu.v -| 00011 | vfcvt.f.x.v -| 00110 | vfcvt.rtz.xu.f.v -| 00111 | vfcvt.rtz.x.f.v -| | -2+| widening converts -| 01000 | vfwcvt.xu.f.v -| 01001 | vfwcvt.x.f.v -| 01010 | vfwcvt.f.xu.v -| 01011 | vfwcvt.f.x.v -| 01100 | vfwcvt.f.f.v -| 01110 | vfwcvt.rtz.xu.f.v -| 01111 | vfwcvt.rtz.x.f.v -| | -2+| narrowing converts -| 10000 | vfncvt.xu.f.w -| 10001 | vfncvt.x.f.w -| 10010 | vfncvt.f.xu.w -| 10011 | vfncvt.f.x.w -| 10100 | vfncvt.f.f.w -| 10101 | vfncvt.rod.f.f.w -| 10110 | vfncvt.rtz.xu.f.w -| 10111 | vfncvt.rtz.x.f.w -|=== - -.VFUNARY1 encoding space -[cols="2,14"] -|=== -| vs1 | name - -| 00000 | vfsqrt.v -| 00100 | vfrsqrt7.v -| 00101 | vfrec7.v -| 10000 | vfclass.v -|=== - - -.VMUNARY0 encoding space -[cols="2,14"] -|=== -| vs1 | - -| 00001 | vmsbf -| 00010 | vmsof -| 00011 | vmsif -| 10000 | viota -| 10001 | vid -|=== - - diff --git a/src/images/wavedrom/v-inst-table.adoc b/src/images/wavedrom/v-inst-table.adoc new file mode 100644 index 0000000..1c3511b --- /dev/null +++ b/src/images/wavedrom/v-inst-table.adoc @@ -0,0 +1,209 @@ + +// [cols="4,1,1,1,8,4,1,1,8,4,1,1,8"] +|=== +5+| Integer 4+| Integer 4+| FP + +| funct3 | | | | | funct3 | | | | funct3 | | | +| OPIVV |V| | | | OPMVV |V| | | OPFVV |V| | +| OPIVX | |X| | | OPMVX | |X| | OPFVF | |F| +| OPIVI | | |I| | | | | | | | | +|=== + +// [cols="4,1,1,1,8,4,1,1,8,4,1,1,8"] +|=== +5+| funct6 4+| funct6 4+| funct6 + +| 000000 |V|X|I| vadd | 000000 |V| | vredsum | 000000 |V|F| vfadd +| 000001 | | | | | 000001 |V| | vredand | 000001 |V| | vfredusum +| 000010 |V|X| | vsub | 000010 |V| | vredor | 000010 |V|F| vfsub +| 000011 | |X|I| vrsub | 000011 |V| | vredxor | 000011 |V| | vfredosum +| 000100 |V|X| | vminu | 000100 |V| | vredminu | 000100 |V|F| vfmin +| 000101 |V|X| | vmin | 000101 |V| | vredmin | 000101 |V| | vfredmin +| 000110 |V|X| | vmaxu | 000110 |V| | vredmaxu | 000110 |V|F| vfmax +| 000111 |V|X| | vmax | 000111 |V| | vredmax | 000111 |V| | vfredmax +| 001000 | | | | | 001000 |V|X| vaaddu | 001000 |V|F| vfsgnj +| 001001 |V|X|I| vand | 001001 |V|X| vaadd | 001001 |V|F| vfsgnjn +| 001010 |V|X|I| vor | 001010 |V|X| vasubu | 001010 |V|F| vfsgnjx +| 001011 |V|X|I| vxor | 001011 |V|X| vasub | 001011 | | | +| 001100 |V|X|I| vrgather | 001100 | | | | 001100 | | | +| 001101 | | | | | 001101 | | | | 001101 | | | +| 001110 | |X|I| vslideup | 001110 | |X| vslide1up | 001110 | |F| vfslide1up +| 001110 |V| | |vrgatherei16| | | | | | | | +| 001111 | |X|I| vslidedown | 001111 | |X| vslide1down | 001111 | |F| vfslide1down +|=== + +// [cols="4,1,1,1,8,4,1,1,8,4,1,1,8"] +|=== +5+| funct6 4+| funct6 4+| funct6 + +| 010000 |V|X|I| vadc | 010000 |V| | VWXUNARY0 | 010000 |V| | VWFUNARY0 +| | | | | | 010000 | |X| VRXUNARY0 | 010000 | |F| VRFUNARY0 +| 010001 |V|X|I| vmadc | 010001 | | | | 010001 | | | +| 010010 |V|X| | vsbc | 010010 |V| | VXUNARY0 | 010010 |V| | VFUNARY0 +| 010011 |V|X| | vmsbc | 010011 | | | | 010011 |V| | VFUNARY1 +| 010100 | | | | | 010100 |V| | VMUNARY0 | 010100 | | | +| 010101 | | | | | 010101 | | | | 010101 | | | +| 010110 | | | | | 010110 | | | | 010110 | | | +| 010111 |V|X|I| vmerge/vmv | 010111 |V| | vcompress | 010111 | |F| vfmerge/vfmv +| 011000 |V|X|I| vmseq | 011000 |V| | vmandn | 011000 |V|F| vmfeq +| 011001 |V|X|I| vmsne | 011001 |V| | vmand | 011001 |V|F| vmfle +| 011010 |V|X| | vmsltu | 011010 |V| | vmor | 011010 | | | +| 011011 |V|X| | vmslt | 011011 |V| | vmxor | 011011 |V|F| vmflt +| 011100 |V|X|I| vmsleu | 011100 |V| | vmorn | 011100 |V|F| vmfne +| 011101 |V|X|I| vmsle | 011101 |V| | vmnand | 011101 | |F| vmfgt +| 011110 | |X|I| vmsgtu | 011110 |V| | vmnor | 011110 | | | +| 011111 | |X|I| vmsgt | 011111 |V| | vmxnor | 011111 | |F| vmfge +|=== + +// [cols="4,1,1,1,8,4,1,1,8,4,1,1,8"] +|=== +5+| funct6 4+| funct6 4+| funct6 + +| 100000 |V|X|I| vsaddu | 100000 |V|X| vdivu | 100000 |V|F| vfdiv +| 100001 |V|X|I| vsadd | 100001 |V|X| vdiv | 100001 | |F| vfrdiv +| 100010 |V|X| | vssubu | 100010 |V|X| vremu | 100010 | | | +| 100011 |V|X| | vssub | 100011 |V|X| vrem | 100011 | | | +| 100100 | | | | | 100100 |V|X| vmulhu | 100100 |V|F| vfmul +| 100101 |V|X|I| vsll | 100101 |V|X| vmul | 100101 | | | +| 100110 | | | | | 100110 |V|X| vmulhsu | 100110 | | | +| 100111 |V|X| | vsmul | 100111 |V|X| vmulh | 100111 | |F| vfrsub +| 100111 | | |I| vmvr | | | | | | | | +| 101000 |V|X|I| vsrl | 101000 | | | | 101000 |V|F| vfmadd +| 101001 |V|X|I| vsra | 101001 |V|X| vmadd | 101001 |V|F| vfnmadd +| 101010 |V|X|I| vssrl | 101010 | | | | 101010 |V|F| vfmsub +| 101011 |V|X|I| vssra | 101011 |V|X| vnmsub | 101011 |V|F| vfnmsub +| 101100 |V|X|I| vnsrl | 101100 | | | | 101100 |V|F| vfmacc +| 101101 |V|X|I| vnsra | 101101 |V|X| vmacc | 101101 |V|F| vfnmacc +| 101110 |V|X|I| vnclipu | 101110 | | | | 101110 |V|F| vfmsac +| 101111 |V|X|I| vnclip | 101111 |V|X| vnmsac | 101111 |V|F| vfnmsac +|=== + +// [cols="4,1,1,1,8,4,1,1,8,4,1,1,8"] +|=== +5+| funct6 4+| funct6 4+| funct6 + +| 110000 |V| | | vwredsumu | 110000 |V|X| vwaddu | 110000 |V|F| vfwadd +| 110001 |V| | | vwredsum | 110001 |V|X| vwadd | 110001 |V| | vfwredusum +| 110010 | | | | | 110010 |V|X| vwsubu | 110010 |V|F| vfwsub +| 110011 | | | | | 110011 |V|X| vwsub | 110011 |V| | vfwredosum +| 110100 | | | | | 110100 |V|X| vwaddu.w | 110100 |V|F| vfwadd.w +| 110101 | | | | | 110101 |V|X| vwadd.w | 110101 | | | +| 110110 | | | | | 110110 |V|X| vwsubu.w | 110110 |V|F| vfwsub.w +| 110111 | | | | | 110111 |V|X| vwsub.w | 110111 | | | +| 111000 | | | | | 111000 |V|X| vwmulu | 111000 |V|F| vfwmul +| 111001 | | | | | 111001 | | | | 111001 | | | +| 111010 | | | | | 111010 |V|X| vwmulsu | 111010 | | | +| 111011 | | | | | 111011 |V|X| vwmul | 111011 | | | +| 111100 | | | | | 111100 |V|X| vwmaccu | 111100 |V|F| vfwmacc +| 111101 | | | | | 111101 |V|X| vwmacc | 111101 |V|F| vfwnmacc +| 111110 | | | | | 111110 | |X| vwmaccus | 111110 |V|F| vfwmsac +| 111111 | | | | | 111111 |V|X| vwmaccsu | 111111 |V|F| vfwnmsac +|=== + +<<< + +.VRXUNARY0 encoding space +[cols="2,14"] +|=== +| vs2 | + +| 00000 | vmv.s.x +|=== + +.VWXUNARY0 encoding space +[cols="2,14"] +|=== +| vs1 | + +| 00000 | vmv.x.s +| 10000 | vcpop +| 10001 | vfirst +|=== + +.VXUNARY0 encoding space +[cols="2,14"] +|=== +| vs1 | + +| 00010 | vzext.vf8 +| 00011 | vsext.vf8 +| 00100 | vzext.vf4 +| 00101 | vsext.vf4 +| 00110 | vzext.vf2 +| 00111 | vsext.vf2 +|=== + +.VRFUNARY0 encoding space +[cols="2,14"] +|=== +| vs2 | + +| 00000 | vfmv.s.f +|=== + +.VWFUNARY0 encoding space +[cols="2,14"] +|=== +| vs1 | + +| 00000 | vfmv.f.s +|=== + +.VFUNARY0 encoding space +[cols="2,14"] +|=== +| vs1 | name + +2+| single-width converts +| 00000 | vfcvt.xu.f.v +| 00001 | vfcvt.x.f.v +| 00010 | vfcvt.f.xu.v +| 00011 | vfcvt.f.x.v +| 00110 | vfcvt.rtz.xu.f.v +| 00111 | vfcvt.rtz.x.f.v +| | +2+| widening converts +| 01000 | vfwcvt.xu.f.v +| 01001 | vfwcvt.x.f.v +| 01010 | vfwcvt.f.xu.v +| 01011 | vfwcvt.f.x.v +| 01100 | vfwcvt.f.f.v +| 01110 | vfwcvt.rtz.xu.f.v +| 01111 | vfwcvt.rtz.x.f.v +| | +2+| narrowing converts +| 10000 | vfncvt.xu.f.w +| 10001 | vfncvt.x.f.w +| 10010 | vfncvt.f.xu.w +| 10011 | vfncvt.f.x.w +| 10100 | vfncvt.f.f.w +| 10101 | vfncvt.rod.f.f.w +| 10110 | vfncvt.rtz.xu.f.w +| 10111 | vfncvt.rtz.x.f.w +|=== + +.VFUNARY1 encoding space +[cols="2,14"] +|=== +| vs1 | name + +| 00000 | vfsqrt.v +| 00100 | vfrsqrt7.v +| 00101 | vfrec7.v +| 10000 | vfclass.v +|=== + + +.VMUNARY0 encoding space +[cols="2,14"] +|=== +| vs1 | + +| 00001 | vmsbf +| 00010 | vmsof +| 00011 | vmsif +| 10000 | viota +| 10001 | vid +|=== + + diff --git a/src/v-st-ext.adoc b/src/v-st-ext.adoc index c1a105f..d13a923 100644 --- a/src/v-st-ext.adoc +++ b/src/v-st-ext.adoc @@ -5181,5 +5181,5 @@ computation in single-precision will suffice. === Vector Instruction Listing -include::images/wavedrom/inst-table.adoc[] +include::images/wavedrom/v-inst-table.adoc[] -- cgit v1.1 From 380b8520f6fa86a76f1ff54365cb3b98606d2eaf Mon Sep 17 00:00:00 2001 From: Tsukasa OI Date: Thu, 10 Aug 2023 07:24:03 +0000 Subject: Revert "Adding untracked files for vector chapter" This reverts commit a9c934e3a9af53f4f2c669d2f2802cd5114469b4. Because all the files are already moved under src/images/wavedrom, we don't need those files. Signed-off-by: Tsukasa OI --- src/inst-table.adoc | 209 -------------------------------------------------- src/valu-format.adoc | 97 ----------------------- src/vcfg-format.adoc | 44 ----------- src/vfrec7.adoc | 136 -------------------------------- src/vfrsqrt7.adoc | 139 --------------------------------- src/vmem-format.adoc | 102 ------------------------ src/vtype-format.adoc | 27 ------- 7 files changed, 754 deletions(-) delete mode 100644 src/inst-table.adoc delete mode 100644 src/valu-format.adoc delete mode 100644 src/vcfg-format.adoc delete mode 100644 src/vfrec7.adoc delete mode 100644 src/vfrsqrt7.adoc delete mode 100644 src/vmem-format.adoc delete mode 100644 src/vtype-format.adoc diff --git a/src/inst-table.adoc b/src/inst-table.adoc deleted file mode 100644 index 1c3511b..0000000 --- a/src/inst-table.adoc +++ /dev/null @@ -1,209 +0,0 @@ - -// [cols="4,1,1,1,8,4,1,1,8,4,1,1,8"] -|=== -5+| Integer 4+| Integer 4+| FP - -| funct3 | | | | | funct3 | | | | funct3 | | | -| OPIVV |V| | | | OPMVV |V| | | OPFVV |V| | -| OPIVX | |X| | | OPMVX | |X| | OPFVF | |F| -| OPIVI | | |I| | | | | | | | | -|=== - -// [cols="4,1,1,1,8,4,1,1,8,4,1,1,8"] -|=== -5+| funct6 4+| funct6 4+| funct6 - -| 000000 |V|X|I| vadd | 000000 |V| | vredsum | 000000 |V|F| vfadd -| 000001 | | | | | 000001 |V| | vredand | 000001 |V| | vfredusum -| 000010 |V|X| | vsub | 000010 |V| | vredor | 000010 |V|F| vfsub -| 000011 | |X|I| vrsub | 000011 |V| | vredxor | 000011 |V| | vfredosum -| 000100 |V|X| | vminu | 000100 |V| | vredminu | 000100 |V|F| vfmin -| 000101 |V|X| | vmin | 000101 |V| | vredmin | 000101 |V| | vfredmin -| 000110 |V|X| | vmaxu | 000110 |V| | vredmaxu | 000110 |V|F| vfmax -| 000111 |V|X| | vmax | 000111 |V| | vredmax | 000111 |V| | vfredmax -| 001000 | | | | | 001000 |V|X| vaaddu | 001000 |V|F| vfsgnj -| 001001 |V|X|I| vand | 001001 |V|X| vaadd | 001001 |V|F| vfsgnjn -| 001010 |V|X|I| vor | 001010 |V|X| vasubu | 001010 |V|F| vfsgnjx -| 001011 |V|X|I| vxor | 001011 |V|X| vasub | 001011 | | | -| 001100 |V|X|I| vrgather | 001100 | | | | 001100 | | | -| 001101 | | | | | 001101 | | | | 001101 | | | -| 001110 | |X|I| vslideup | 001110 | |X| vslide1up | 001110 | |F| vfslide1up -| 001110 |V| | |vrgatherei16| | | | | | | | -| 001111 | |X|I| vslidedown | 001111 | |X| vslide1down | 001111 | |F| vfslide1down -|=== - -// [cols="4,1,1,1,8,4,1,1,8,4,1,1,8"] -|=== -5+| funct6 4+| funct6 4+| funct6 - -| 010000 |V|X|I| vadc | 010000 |V| | VWXUNARY0 | 010000 |V| | VWFUNARY0 -| | | | | | 010000 | |X| VRXUNARY0 | 010000 | |F| VRFUNARY0 -| 010001 |V|X|I| vmadc | 010001 | | | | 010001 | | | -| 010010 |V|X| | vsbc | 010010 |V| | VXUNARY0 | 010010 |V| | VFUNARY0 -| 010011 |V|X| | vmsbc | 010011 | | | | 010011 |V| | VFUNARY1 -| 010100 | | | | | 010100 |V| | VMUNARY0 | 010100 | | | -| 010101 | | | | | 010101 | | | | 010101 | | | -| 010110 | | | | | 010110 | | | | 010110 | | | -| 010111 |V|X|I| vmerge/vmv | 010111 |V| | vcompress | 010111 | |F| vfmerge/vfmv -| 011000 |V|X|I| vmseq | 011000 |V| | vmandn | 011000 |V|F| vmfeq -| 011001 |V|X|I| vmsne | 011001 |V| | vmand | 011001 |V|F| vmfle -| 011010 |V|X| | vmsltu | 011010 |V| | vmor | 011010 | | | -| 011011 |V|X| | vmslt | 011011 |V| | vmxor | 011011 |V|F| vmflt -| 011100 |V|X|I| vmsleu | 011100 |V| | vmorn | 011100 |V|F| vmfne -| 011101 |V|X|I| vmsle | 011101 |V| | vmnand | 011101 | |F| vmfgt -| 011110 | |X|I| vmsgtu | 011110 |V| | vmnor | 011110 | | | -| 011111 | |X|I| vmsgt | 011111 |V| | vmxnor | 011111 | |F| vmfge -|=== - -// [cols="4,1,1,1,8,4,1,1,8,4,1,1,8"] -|=== -5+| funct6 4+| funct6 4+| funct6 - -| 100000 |V|X|I| vsaddu | 100000 |V|X| vdivu | 100000 |V|F| vfdiv -| 100001 |V|X|I| vsadd | 100001 |V|X| vdiv | 100001 | |F| vfrdiv -| 100010 |V|X| | vssubu | 100010 |V|X| vremu | 100010 | | | -| 100011 |V|X| | vssub | 100011 |V|X| vrem | 100011 | | | -| 100100 | | | | | 100100 |V|X| vmulhu | 100100 |V|F| vfmul -| 100101 |V|X|I| vsll | 100101 |V|X| vmul | 100101 | | | -| 100110 | | | | | 100110 |V|X| vmulhsu | 100110 | | | -| 100111 |V|X| | vsmul | 100111 |V|X| vmulh | 100111 | |F| vfrsub -| 100111 | | |I| vmvr | | | | | | | | -| 101000 |V|X|I| vsrl | 101000 | | | | 101000 |V|F| vfmadd -| 101001 |V|X|I| vsra | 101001 |V|X| vmadd | 101001 |V|F| vfnmadd -| 101010 |V|X|I| vssrl | 101010 | | | | 101010 |V|F| vfmsub -| 101011 |V|X|I| vssra | 101011 |V|X| vnmsub | 101011 |V|F| vfnmsub -| 101100 |V|X|I| vnsrl | 101100 | | | | 101100 |V|F| vfmacc -| 101101 |V|X|I| vnsra | 101101 |V|X| vmacc | 101101 |V|F| vfnmacc -| 101110 |V|X|I| vnclipu | 101110 | | | | 101110 |V|F| vfmsac -| 101111 |V|X|I| vnclip | 101111 |V|X| vnmsac | 101111 |V|F| vfnmsac -|=== - -// [cols="4,1,1,1,8,4,1,1,8,4,1,1,8"] -|=== -5+| funct6 4+| funct6 4+| funct6 - -| 110000 |V| | | vwredsumu | 110000 |V|X| vwaddu | 110000 |V|F| vfwadd -| 110001 |V| | | vwredsum | 110001 |V|X| vwadd | 110001 |V| | vfwredusum -| 110010 | | | | | 110010 |V|X| vwsubu | 110010 |V|F| vfwsub -| 110011 | | | | | 110011 |V|X| vwsub | 110011 |V| | vfwredosum -| 110100 | | | | | 110100 |V|X| vwaddu.w | 110100 |V|F| vfwadd.w -| 110101 | | | | | 110101 |V|X| vwadd.w | 110101 | | | -| 110110 | | | | | 110110 |V|X| vwsubu.w | 110110 |V|F| vfwsub.w -| 110111 | | | | | 110111 |V|X| vwsub.w | 110111 | | | -| 111000 | | | | | 111000 |V|X| vwmulu | 111000 |V|F| vfwmul -| 111001 | | | | | 111001 | | | | 111001 | | | -| 111010 | | | | | 111010 |V|X| vwmulsu | 111010 | | | -| 111011 | | | | | 111011 |V|X| vwmul | 111011 | | | -| 111100 | | | | | 111100 |V|X| vwmaccu | 111100 |V|F| vfwmacc -| 111101 | | | | | 111101 |V|X| vwmacc | 111101 |V|F| vfwnmacc -| 111110 | | | | | 111110 | |X| vwmaccus | 111110 |V|F| vfwmsac -| 111111 | | | | | 111111 |V|X| vwmaccsu | 111111 |V|F| vfwnmsac -|=== - -<<< - -.VRXUNARY0 encoding space -[cols="2,14"] -|=== -| vs2 | - -| 00000 | vmv.s.x -|=== - -.VWXUNARY0 encoding space -[cols="2,14"] -|=== -| vs1 | - -| 00000 | vmv.x.s -| 10000 | vcpop -| 10001 | vfirst -|=== - -.VXUNARY0 encoding space -[cols="2,14"] -|=== -| vs1 | - -| 00010 | vzext.vf8 -| 00011 | vsext.vf8 -| 00100 | vzext.vf4 -| 00101 | vsext.vf4 -| 00110 | vzext.vf2 -| 00111 | vsext.vf2 -|=== - -.VRFUNARY0 encoding space -[cols="2,14"] -|=== -| vs2 | - -| 00000 | vfmv.s.f -|=== - -.VWFUNARY0 encoding space -[cols="2,14"] -|=== -| vs1 | - -| 00000 | vfmv.f.s -|=== - -.VFUNARY0 encoding space -[cols="2,14"] -|=== -| vs1 | name - -2+| single-width converts -| 00000 | vfcvt.xu.f.v -| 00001 | vfcvt.x.f.v -| 00010 | vfcvt.f.xu.v -| 00011 | vfcvt.f.x.v -| 00110 | vfcvt.rtz.xu.f.v -| 00111 | vfcvt.rtz.x.f.v -| | -2+| widening converts -| 01000 | vfwcvt.xu.f.v -| 01001 | vfwcvt.x.f.v -| 01010 | vfwcvt.f.xu.v -| 01011 | vfwcvt.f.x.v -| 01100 | vfwcvt.f.f.v -| 01110 | vfwcvt.rtz.xu.f.v -| 01111 | vfwcvt.rtz.x.f.v -| | -2+| narrowing converts -| 10000 | vfncvt.xu.f.w -| 10001 | vfncvt.x.f.w -| 10010 | vfncvt.f.xu.w -| 10011 | vfncvt.f.x.w -| 10100 | vfncvt.f.f.w -| 10101 | vfncvt.rod.f.f.w -| 10110 | vfncvt.rtz.xu.f.w -| 10111 | vfncvt.rtz.x.f.w -|=== - -.VFUNARY1 encoding space -[cols="2,14"] -|=== -| vs1 | name - -| 00000 | vfsqrt.v -| 00100 | vfrsqrt7.v -| 00101 | vfrec7.v -| 10000 | vfclass.v -|=== - - -.VMUNARY0 encoding space -[cols="2,14"] -|=== -| vs1 | - -| 00001 | vmsbf -| 00010 | vmsof -| 00011 | vmsif -| 10000 | viota -| 10001 | vid -|=== - - diff --git a/src/valu-format.adoc b/src/valu-format.adoc deleted file mode 100644 index c6f6f52..0000000 --- a/src/valu-format.adoc +++ /dev/null @@ -1,97 +0,0 @@ -Formats for Vector Arithmetic Instructions under OP-V major opcode - -//// -31 26 25 24 20 19 15 14 12 11 7 6 0 - funct6 | vm | vs2 | vs1 | 0 0 0 | vd |1010111| OP-V (OPIVV) - funct6 | vm | vs2 | vs1 | 0 0 1 | vd/rd |1010111| OP-V (OPFVV) - funct6 | vm | vs2 | vs1 | 0 1 0 | vd/rd |1010111| OP-V (OPMVV) - funct6 | vm | vs2 | imm[4:0] | 0 1 1 | vd |1010111| OP-V (OPIVI) - funct6 | vm | vs2 | rs1 | 1 0 0 | vd |1010111| OP-V (OPIVX) - funct6 | vm | vs2 | rs1 | 1 0 1 | vd |1010111| OP-V (OPFVF) - funct6 | vm | vs2 | rs1 | 1 1 0 | vd/rd |1010111| OP-V (OPMVX) - 6 1 5 5 3 5 7 -//// - -```wavedrom -{reg: [ - {bits: 7, name: 0x57, attr: 'OPIVV'}, - {bits: 5, name: 'vd', type: 2}, - {bits: 3, name: 0}, - {bits: 5, name: 'vs1', type: 2}, - {bits: 5, name: 'vs2', type: 2}, - {bits: 1, name: 'vm'}, - {bits: 6, name: 'funct6'}, -]} -``` - -```wavedrom -{reg: [ - {bits: 7, name: 0x57, attr: 'OPFVV'}, - {bits: 5, name: 'vd / rd', type: 7}, - {bits: 3, name: 1}, - {bits: 5, name: 'vs1', type: 2}, - {bits: 5, name: 'vs2', type: 2}, - {bits: 1, name: 'vm'}, - {bits: 6, name: 'funct6'}, -]} -``` - -```wavedrom -{reg: [ - {bits: 7, name: 0x57, attr: 'OPMVV'}, - {bits: 5, name: 'vd / rd', type: 7}, - {bits: 3, name: 2}, - {bits: 5, name: 'vs1', type: 2}, - {bits: 5, name: 'vs2', type: 2}, - {bits: 1, name: 'vm'}, - {bits: 6, name: 'funct6'}, -]} -``` - -```wavedrom -{reg: [ - {bits: 7, name: 0x57, attr: ['OPIVI']}, - {bits: 5, name: 'vd', type: 2}, - {bits: 3, name: 3}, - {bits: 5, name: 'imm[4:0]', type: 5}, - {bits: 5, name: 'vs2', type: 2}, - {bits: 1, name: 'vm'}, - {bits: 6, name: 'funct6'}, -]} -``` - -```wavedrom -{reg: [ - {bits: 7, name: 0x57, attr: 'OPIVX'}, - {bits: 5, name: 'vd', type: 2}, - {bits: 3, name: 4}, - {bits: 5, name: 'rs1', type: 4}, - {bits: 5, name: 'vs2', type: 2}, - {bits: 1, name: 'vm'}, - {bits: 6, name: 'funct6'}, -]} -``` - -```wavedrom -{reg: [ - {bits: 7, name: 0x57, attr: 'OPFVF'}, - {bits: 5, name: 'vd', type: 2}, - {bits: 3, name: 5}, - {bits: 5, name: 'rs1', type: 4}, - {bits: 5, name: 'vs2', type: 2}, - {bits: 1, name: 'vm'}, - {bits: 6, name: 'funct6'}, -]} -``` - -```wavedrom -{reg: [ - {bits: 7, name: 0x57, attr: 'OPMVX'}, - {bits: 5, name: 'vd / rd', type: 7}, - {bits: 3, name: 6}, - {bits: 5, name: 'rs1', type: 4}, - {bits: 5, name: 'vs2', type: 2}, - {bits: 1, name: 'vm'}, - {bits: 6, name: 'funct6'}, -]} -``` diff --git a/src/vcfg-format.adoc b/src/vcfg-format.adoc deleted file mode 100644 index f1bb4c0..0000000 --- a/src/vcfg-format.adoc +++ /dev/null @@ -1,44 +0,0 @@ -Formats for Vector Configuration Instructions under OP-V major opcode - -//// - 31 30 25 24 20 19 15 14 12 11 7 6 0 - 0 | zimm[10:0] | rs1 | 1 1 1 | rd |1010111| vsetvli - 1 | 1| zimm[ 9:0] | uimm[4:0]| 1 1 1 | rd |1010111| vsetivli - 1 | 000000 | rs2 | rs1 | 1 1 1 | rd |1010111| vsetvl - 1 6 5 5 3 5 7 -//// - -```wavedrom -{reg: [ - {bits: 7, name: 0x57, attr: 'vsetvli'}, - {bits: 5, name: 'rd', type: 4}, - {bits: 3, name: 7}, - {bits: 5, name: 'rs1', type: 4}, - {bits: 11, name: 'vtypei[10:0]', type: 5}, - {bits: 1, name: '0'}, -]} -``` - -```wavedrom -{reg: [ - {bits: 7, name: 0x57, attr: 'vsetivli'}, - {bits: 5, name: 'rd', type: 4}, - {bits: 3, name: 7}, - {bits: 5, name: 'uimm[4:0]', type: 5}, - {bits: 10, name: 'vtypei[9:0]', type: 5}, - {bits: 1, name: '1'}, - {bits: 1, name: '1'}, -]} -``` - -```wavedrom -{reg: [ - {bits: 7, name: 0x57, attr: 'vsetvl'}, - {bits: 5, name: 'rd', type: 4}, - {bits: 3, name: 7}, - {bits: 5, name: 'rs1', type: 4}, - {bits: 5, name: 'rs2', type: 4}, - {bits: 6, name: 0x00}, - {bits: 1, name: 1}, -]} -``` diff --git a/src/vfrec7.adoc b/src/vfrec7.adoc deleted file mode 100644 index 02abe60..0000000 --- a/src/vfrec7.adoc +++ /dev/null @@ -1,136 +0,0 @@ -.vfrec7.v common-case lookup table contents -[%autowidth] -|=== - -| sig[MSB -: 7] | sig_out[MSB -: 7] - -| 0 | 127 -| 1 | 125 -| 2 | 123 -| 3 | 121 -| 4 | 119 -| 5 | 117 -| 6 | 116 -| 7 | 114 -| 8 | 112 -| 9 | 110 -| 10 | 109 -| 11 | 107 -| 12 | 105 -| 13 | 104 -| 14 | 102 -| 15 | 100 -| 16 | 99 -| 17 | 97 -| 18 | 96 -| 19 | 94 -| 20 | 93 -| 21 | 91 -| 22 | 90 -| 23 | 88 -| 24 | 87 -| 25 | 85 -| 26 | 84 -| 27 | 83 -| 28 | 81 -| 29 | 80 -| 30 | 79 -| 31 | 77 -| 32 | 76 -| 33 | 75 -| 34 | 74 -| 35 | 72 -| 36 | 71 -| 37 | 70 -| 38 | 69 -| 39 | 68 -| 40 | 66 -| 41 | 65 -| 42 | 64 -| 43 | 63 -| 44 | 62 -| 45 | 61 -| 46 | 60 -| 47 | 59 -| 48 | 58 -| 49 | 57 -| 50 | 56 -| 51 | 55 -| 52 | 54 -| 53 | 53 -| 54 | 52 -| 55 | 51 -| 56 | 50 -| 57 | 49 -| 58 | 48 -| 59 | 47 -| 60 | 46 -| 61 | 45 -| 62 | 44 -| 63 | 43 -| 64 | 42 -| 65 | 41 -| 66 | 40 -| 67 | 40 -| 68 | 39 -| 69 | 38 -| 70 | 37 -| 71 | 36 -| 72 | 35 -| 73 | 35 -| 74 | 34 -| 75 | 33 -| 76 | 32 -| 77 | 31 -| 78 | 31 -| 79 | 30 -| 80 | 29 -| 81 | 28 -| 82 | 28 -| 83 | 27 -| 84 | 26 -| 85 | 25 -| 86 | 25 -| 87 | 24 -| 88 | 23 -| 89 | 23 -| 90 | 22 -| 91 | 21 -| 92 | 21 -| 93 | 20 -| 94 | 19 -| 95 | 19 -| 96 | 18 -| 97 | 17 -| 98 | 17 -| 99 | 16 -| 100 | 15 -| 101 | 15 -| 102 | 14 -| 103 | 14 -| 104 | 13 -| 105 | 12 -| 106 | 12 -| 107 | 11 -| 108 | 11 -| 109 | 10 -| 110 | 9 -| 111 | 9 -| 112 | 8 -| 113 | 8 -| 114 | 7 -| 115 | 7 -| 116 | 6 -| 117 | 5 -| 118 | 5 -| 119 | 4 -| 120 | 4 -| 121 | 3 -| 122 | 3 -| 123 | 2 -| 124 | 2 -| 125 | 1 -| 126 | 1 -| 127 | 0 - -|=== diff --git a/src/vfrsqrt7.adoc b/src/vfrsqrt7.adoc deleted file mode 100644 index ace8022..0000000 --- a/src/vfrsqrt7.adoc +++ /dev/null @@ -1,139 +0,0 @@ -.vfrsqrt7.v common-case lookup table contents -[%autowidth] -|=== - -|exp[0] | sig[MSB -: 6] | sig_out[MSB -: 7] - -.64+|0 -| 0 | 52 -| 1 | 51 -| 2 | 50 -| 3 | 48 -| 4 | 47 -| 5 | 46 -| 6 | 44 -| 7 | 43 -| 8 | 42 -| 9 | 41 -| 10 | 40 -| 11 | 39 -| 12 | 38 -| 13 | 36 -| 14 | 35 -| 15 | 34 -| 16 | 33 -| 17 | 32 -| 18 | 31 -| 19 | 30 -| 20 | 30 -| 21 | 29 -| 22 | 28 -| 23 | 27 -| 24 | 26 -| 25 | 25 -| 26 | 24 -| 27 | 23 -| 28 | 23 -| 29 | 22 -| 30 | 21 -| 31 | 20 -| 32 | 19 -| 33 | 19 -| 34 | 18 -| 35 | 17 -| 36 | 16 -| 37 | 16 -| 38 | 15 -| 39 | 14 -| 40 | 14 -| 41 | 13 -| 42 | 12 -| 43 | 12 -| 44 | 11 -| 45 | 10 -| 46 | 10 -| 47 | 9 -| 48 | 9 -| 49 | 8 -| 50 | 7 -| 51 | 7 -| 52 | 6 -| 53 | 6 -| 54 | 5 -| 55 | 4 -| 56 | 4 -| 57 | 3 -| 58 | 3 -| 59 | 2 -| 60 | 2 -| 61 | 1 -| 62 | 1 -| 63 | 0 - -.64+|1 -| 0 | 127 -| 1 | 125 -| 2 | 123 -| 3 | 121 -| 4 | 119 -| 5 | 118 -| 6 | 116 -| 7 | 114 -| 8 | 113 -| 9 | 111 -| 10 | 109 -| 11 | 108 -| 12 | 106 -| 13 | 105 -| 14 | 103 -| 15 | 102 -| 16 | 100 -| 17 | 99 -| 18 | 97 -| 19 | 96 -| 20 | 95 -| 21 | 93 -| 22 | 92 -| 23 | 91 -| 24 | 90 -| 25 | 88 -| 26 | 87 -| 27 | 86 -| 28 | 85 -| 29 | 84 -| 30 | 83 -| 31 | 82 -| 32 | 80 -| 33 | 79 -| 34 | 78 -| 35 | 77 -| 36 | 76 -| 37 | 75 -| 38 | 74 -| 39 | 73 -| 40 | 72 -| 41 | 71 -| 42 | 70 -| 43 | 70 -| 44 | 69 -| 45 | 68 -| 46 | 67 -| 47 | 66 -| 48 | 65 -| 49 | 64 -| 50 | 63 -| 51 | 63 -| 52 | 62 -| 53 | 61 -| 54 | 60 -| 55 | 59 -| 56 | 59 -| 57 | 58 -| 58 | 57 -| 59 | 56 -| 60 | 56 -| 61 | 55 -| 62 | 54 -| 63 | 53 - -|=== diff --git a/src/vmem-format.adoc b/src/vmem-format.adoc deleted file mode 100644 index 3b20043..0000000 --- a/src/vmem-format.adoc +++ /dev/null @@ -1,102 +0,0 @@ -Format for Vector Load Instructions under LOAD-FP major opcode - -//// -31 29 28 27 26 25 24 20 19 15 14 12 11 7 6 0 - nf | mew| mop | vm | lumop | rs1 | width | vd |0000111| VL* unit-stride - nf | mew| mop | vm | rs2 | rs1 | width | vd |0000111| VLS* strided - nf | mew| mop | vm | vs2 | rs1 | width | vd |0000111| VLX* indexed - 3 1 2 1 5 5 3 5 7 -//// - -```wavedrom -{reg: [ - {bits: 7, name: 0x7, attr: 'VL* unit-stride'}, - {bits: 5, name: 'vd', attr: 'destination of load', type: 2}, - {bits: 3, name: 'width'}, - {bits: 5, name: 'rs1', attr: 'base address', type: 4}, - {bits: 5, name: 'lumop'}, - {bits: 1, name: 'vm'}, - {bits: 2, name: 'mop'}, - {bits: 1, name: 'mew'}, - {bits: 3, name: 'nf'}, -]} -``` - -```wavedrom -{reg: [ - {bits: 7, name: 0x7, attr: 'VLS* strided'}, - {bits: 5, name: 'vd', attr: 'destination of load', type: 2}, - {bits: 3, name: 'width'}, - {bits: 5, name: 'rs1', attr: 'base address', type: 4}, - {bits: 5, name: 'rs2', attr: 'stride', type: 4}, - {bits: 1, name: 'vm'}, - {bits: 2, name: 'mop'}, - {bits: 1, name: 'mew'}, - {bits: 3, name: 'nf'}, -]} -``` - -```wavedrom -{reg: [ - {bits: 7, name: 0x7, attr: 'VLX* indexed'}, - {bits: 5, name: 'vd', attr: 'destination of load', type: 2}, - {bits: 3, name: 'width'}, - {bits: 5, name: 'rs1', attr: 'base address', type: 4}, - {bits: 5, name: 'vs2', attr: 'address offsets', type: 2}, - {bits: 1, name: 'vm'}, - {bits: 2, name: 'mop'}, - {bits: 1, name: 'mew'}, - {bits: 3, name: 'nf'}, -]} -``` -Format for Vector Store Instructions under STORE-FP major opcode - -//// -31 29 28 27 26 25 24 20 19 15 14 12 11 7 6 0 - nf | mew| mop | vm | sumop | rs1 | width | vs3 |0100111| VS* unit-stride - nf | mew| mop | vm | rs2 | rs1 | width | vs3 |0100111| VSS* strided - nf | mew| mop | vm | vs2 | rs1 | width | vs3 |0100111| VSX* indexed - 3 1 2 1 5 5 3 5 7 -//// - -```wavedrom -{reg: [ - {bits: 7, name: 0x27, attr: 'VS* unit-stride'}, - {bits: 5, name: 'vs3', attr: 'store data', type: 2}, - {bits: 3, name: 'width'}, - {bits: 5, name: 'rs1', attr: 'base address', type: 4}, - {bits: 5, name: 'sumop'}, - {bits: 1, name: 'vm'}, - {bits: 2, name: 'mop'}, - {bits: 1, name: 'mew'}, - {bits: 3, name: 'nf'}, -]} -``` - -```wavedrom -{reg: [ - {bits: 7, name: 0x27, attr: 'VSS* strided'}, - {bits: 5, name: 'vs3', attr: 'store data', type: 2}, - {bits: 3, name: 'width'}, - {bits: 5, name: 'rs1', attr: 'base address', type: 4}, - {bits: 5, name: 'rs2', attr: 'stride', type: 4}, - {bits: 1, name: 'vm'}, - {bits: 2, name: 'mop'}, - {bits: 1, name: 'mew'}, - {bits: 3, name: 'nf'}, -]} -``` - -```wavedrom -{reg: [ - {bits: 7, name: 0x27, attr: 'VSX* indexed'}, - {bits: 5, name: 'vs3', attr: 'store data', type: 2}, - {bits: 3, name: 'width'}, - {bits: 5, name: 'rs1', attr: 'base address', type: 4}, - {bits: 5, name: 'vs2', attr: 'address offsets', type: 2}, - {bits: 1, name: 'vm'}, - {bits: 2, name: 'mop'}, - {bits: 1, name: 'mew'}, - {bits: 3, name: 'nf'}, -]} -``` diff --git a/src/vtype-format.adoc b/src/vtype-format.adoc deleted file mode 100644 index a97af34..0000000 --- a/src/vtype-format.adoc +++ /dev/null @@ -1,27 +0,0 @@ -```wavedrom -{reg: [ - {bits: 3, name: 'vlmul[2:0]'}, - {bits: 3, name: 'vsew[2:0]'}, - {bits: 1, name: 'vta'}, - {bits: 1, name: 'vma'}, - {bits: 23, name: 'reserved'}, - {bits: 1, name: 'vill'}, -]} -``` - -NOTE: This diagram shows the layout for RV32 systems, whereas in -general `vill` should be at bit XLEN-1. - -.`vtype` register layout -[cols=">2,4,10"] -[%autowidth] -|=== -| Bits | Name | Description - -| XLEN-1 | vill | Illegal value if set -| XLEN-2:8 | 0 | Reserved if non-zero -| 7 | vma | Vector mask agnostic -| 6 | vta | Vector tail agnostic -| 5:3 | vsew[2:0] | Selected element width (SEW) setting -| 2:0 | vlmul[2:0] | Vector register group multiplier (LMUL) setting -|=== -- cgit v1.1 From 2eac1baf542256c0298cb6f4677eb30f029ab5ef Mon Sep 17 00:00:00 2001 From: eopXD Date: Thu, 10 Aug 2023 15:53:29 +0900 Subject: Fix error in description for vslide1up Signed-off-by: eop Chen --- src/v-st-ext.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/v-st-ext.adoc b/src/v-st-ext.adoc index c1a105f..3e5575d 100644 --- a/src/v-st-ext.adoc +++ b/src/v-st-ext.adoc @@ -4590,7 +4590,7 @@ location 0 of the destination vector register group, provided that element 0 is active, otherwise the destination element update follows the current mask agnostic/undisturbed policy. If XLEN < SEW, the value is sign-extended to SEW bits. If XLEN > SEW, the least-significant bits -are copied over and the high SEW-XLEN bits are ignored. +are copied over and the high XLEN-SEW bits are ignored. The remaining active `vl`-1 elements are copied over from index _i_ in the source vector register group to index _i_+1 in the destination -- cgit v1.1 From 819286d3bbff740780c4593c47e081744b73ac47 Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Fri, 12 Jan 2024 06:58:41 -0500 Subject: Clarify out-of-range behavior for vfncvt.rod.f.f.w Manually applying change from Vector spec. --- src/v-st-ext.adoc | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/v-st-ext.adoc b/src/v-st-ext.adoc index 3e5575d..20f047c 100644 --- a/src/v-st-ext.adoc +++ b/src/v-st-ext.adoc @@ -3826,6 +3826,9 @@ the same exception flags are raised if all but the last halving step use round-towards-odd (`vfncvt.rod.f.f.w`). Only the final step should use the desired rounding mode. +NOTE: For `vfncvt.rod.f.f.w`, a finite value that exceeds the range of the +destination format is converted to the destination format's largest finite value with the same sign. + === Vector Reduction Operations Vector reduction operations take a vector register group of elements -- cgit v1.1 From 6cd53fe0302ffeccd761b55919bad605d36cc04e Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Fri, 12 Jan 2024 07:04:42 -0500 Subject: vslide unchanged range correction Manually applying a change that was made to the vector spec. --- src/v-st-ext.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/v-st-ext.adoc b/src/v-st-ext.adoc index 20f047c..0e44870 100644 --- a/src/v-st-ext.adoc +++ b/src/v-st-ext.adoc @@ -4536,7 +4536,7 @@ if _OFFSET_ < `vl`. OFFSET is amount to slideup, either from x register or a 5-bit immediate - 0 < i < max(vstart, OFFSET) Unchanged + 0 <= i < max(vstart, OFFSET) Unchanged max(vstart, OFFSET) <= i < vl vd[i] = vs2[i-OFFSET] if v0.mask[i] enabled vl <= i < VLMAX Follow tail policy ---- @@ -4572,7 +4572,7 @@ If XLEN > SEW, _OFFSET_ is _not_ truncated to SEW bits. VLMAX <= i+OFFSET src[i] = 0 vslidedown behavior for destination element i in slide - 0 < i < vstart Unchanged + 0 <= i < vstart Unchanged vstart <= i < vl vd[i] = src[i] if v0.mask[i] enabled vl <= i < VLMAX Follow tail policy -- cgit v1.1 From a7854a76566ec41140f5f9746955c10eb2f44e16 Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Mon, 29 Jan 2024 15:41:24 -0500 Subject: Removing priv latex build Removing priv latex build --- build/Makefile | 38 +++++++++++++++++++------------------- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/build/Makefile b/build/Makefile index 7fe0f6f..9b72b01 100644 --- a/build/Makefile +++ b/build/Makefile @@ -34,13 +34,13 @@ ASCIIDOCTOR_OPTS := --attribute=mathematical-format=svg \ SRCDIR := ../src # LaTeX source and related files -SRCS := $(wildcard $(SRCDIR)/latex/*.tex) -FIGS := $(wildcard $(SRCDIR)/latex/figs/*) -BIBS := $(SRCDIR)/latex/riscv-spec.bib +#SRCS := $(wildcard $(SRCDIR)/latex/*.tex) +#FIGS := $(wildcard $(SRCDIR)/latex/figs/*) +#BIBS := $(SRCDIR)/latex/riscv-spec.bib # LaTeX build tools -PDFLATEX := TEXINPUTS=$(SRCDIR)/latex: pdflatex -interaction=nonstopmode -halt-on-error -BIBTEX := BIBINPUTS=$(SRCDIR)/latex: bibtex +#PDFLATEX := TEXINPUTS=$(SRCDIR)/latex: pdflatex -interaction=nonstopmode -halt-on-error +#BIBTEX := BIBINPUTS=$(SRCDIR)/latex: bibtex # Temporary files to clean up for LaTeX build JUNK := *.pdf *.aux *.log *.bbl *.blg *.toc *.out *.fdb_latexmk *.fls *.synctex.gz @@ -78,13 +78,13 @@ unpriv-isa-asciidoc.html: $(SRCDIR)/riscv-unprivileged.adoc asciidoctor $(ASCIIDOCTOR_OPTS) --out-file=$@ $< # LaTeX build for Privileged ISA -priv-latex: riscv-privileged.pdf +#priv-latex: riscv-privileged.pdf -riscv-privileged.pdf: $(SRCDIR)/latex/riscv-privileged.tex $(SRCS) $(FIGS) $(BIBS) - $(PDFLATEX) riscv-privileged - $(BIBTEX) riscv-privileged - $(PDFLATEX) riscv-privileged - $(PDFLATEX) riscv-privileged +#riscv-privileged.pdf: $(SRCDIR)/latex/riscv-privileged.tex $(SRCS) $(FIGS) $(BIBS) +# $(PDFLATEX) riscv-privileged +# $(BIBTEX) riscv-privileged +# $(PDFLATEX) riscv-privileged +# $(PDFLATEX) riscv-privileged clean: @if [ -f priv-isa-asciidoc.pdf ]; then \ @@ -103,11 +103,11 @@ clean: echo "Removing unpriv-isa-asciidoc.html"; \ rm -f unpriv-isa-asciidoc.html; \ fi - @echo "Cleaning up files from LaTeX build" - @cd $(SRCDIR)/latex; \ - for file in $(JUNK); do \ - if [ -f "$$file" ]; then \ - echo "Removing $$file"; \ - rm -f "$$file"; \ - fi; \ - done +# @echo "Cleaning up files from LaTeX build" +# @cd $(SRCDIR)/latex; \ +# for file in $(JUNK); do \ +# if [ -f "$$file" ]; then \ +# echo "Removing $$file"; \ +# rm -f "$$file"; \ +# fi; \ +# done -- cgit v1.1 From ed3048981ff0ae1ebba579405f7366bee27ab279 Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Tue, 30 Jan 2024 12:46:18 -0500 Subject: Update src/b-st-ext.adoc Co-authored-by: Kersten Richter Signed-off-by: Bill Traynor --- src/b-st-ext.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/b-st-ext.adoc b/src/b-st-ext.adoc index f8e3999..7bf1dc8 100644 --- a/src/b-st-ext.adoc +++ b/src/b-st-ext.adoc @@ -451,7 +451,7 @@ along with their specific mapping: |==== [#zba,reftext=Address generation instructions] -==== Zba extension +==== Zba: Address generation [NOTE,caption=Frozen] ==== -- cgit v1.1 From dbc79cf28a29f098e0c053c556ad421891554ebb Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Wed, 31 Jan 2024 10:33:06 -0500 Subject: Initial seed of zc.adoc to src tree. Added the zc.adoc spec to the src tree. --- src/zc.adoc | 393 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 393 insertions(+) create mode 100644 src/zc.adoc diff --git a/src/zc.adoc b/src/zc.adoc new file mode 100644 index 0000000..0ec1b3e --- /dev/null +++ b/src/zc.adoc @@ -0,0 +1,393 @@ +:sectnums: +:version-label: v1.0.4-2 +:lifecycle-state: ratified + +[#Zc] +== Zc* {version-label} + +=== Change history since v0.70.1 (tagged release) + +.Change history +[width="100%",options=header] +|==================================================================================== +|Version | change +|v1.0.4-3 | Added misa.C clarification +|v1.0.4-2 | Added rule that C implies Zca, Zcf, Zcd - discussed in https://github.com/riscv/riscv-isa-manual/issues/1132 +|v1.0.4-1 | Added rule that Zcf implies F and Zcd implies D - discussed in https://github.com/riscv/riscv-code-size-reduction/issues/221 + +|v1.0.4 | Resolve https://github.com/riscv/riscv-code-size-reduction/issues/221 - Zcf doesn't exist on RV64 as it contains no instructions +|v1.0.3-1 | Replace statement about non-idempotent memory handler completing the sequence (non-normative) +|v1.0.3 | Add definition of Zce +|v1.0.2 | Fix Architecture Review Committee feedback on instruction formats +|v1.0.1 | Post public review fixes: Add instruction formats (issue 192). Clarify that Zcmt/Zcmp are for embedded CPUs (issue 190). Fix some typos. +|v1.0.0-RC5.7| Add Zcb description and fix some typos. PUBLIC REVIEW REVISION. +|v1.0.0-RC5.6| Remove Zcmpe which is _not_ frozen and is causing confusion +|v1.0.0-RC5.5| Following ARC review Adjust the split so we have 224 cm.jalt and 32 cm.jt +|v1.0.0-RC5.4| Change wording for dependencies to match arch manual "Zxxx requires Zyyy" changed to "Zxxx depends on Zyyy" +|v1.0.0-RC5.3| Add dependency on Zicsr for Zcmt +|v1.0.0-RC5.2| Adjust the split so we have 240 cm.jalt and 16 cm.jt +|v1.0.0-RC5.1| Make cm.jt/cm.jalt only valid if JVT.mode=0, and allow different behaviour in the future if JVT.mode>0 +|v1.0.0-RC5| Revert to cm.jt and cm.jalt encodings, to avoid toolchain and trace problems +|v1.0.0-RC4.1| Resolve typographical issues with the document only, no actual changes +|v1.0.0-RC4| Release candidate +| | Remove Zcmb as benefit is low. Remove cm.jalt, read LSB of jump table entry to determine whether to link +|v0.70.5 | Resolve https://github.com/riscv/riscv-code-size-reduction/issues/163 - jvt.base is WARL and fewer bits than the max can be implemented +|v0.70.4 | Clarified https://github.com/riscv/riscv-code-size-reduction/issues/159 - Need Zbb and Zba for RV64 and M/ZMmul to get _all_ of Zcb +| | Resolved https://github.com/riscv/riscv-code-size-reduction/issues/161 +| | Resolved https://github.com/riscv/riscv-code-size-reduction/issues/160 - Allocated Smstateen bit 2 and added the relevant text +|v0.70.3 | Added rule that Zcf and Zcmt imply Zca (this text was missing, this is not a spec change: https://github.com/riscv/riscv-code-size-reduction/pull/151) +| | Added that Zcf is illegal for RV64, as it contains no instructions (clarification: https://github.com/riscv/riscv-code-size-reduction/issues/149) +| | Added push/pop examples in the push/pop section +|v0.70.2 | Stylistic changes only, removing redundant text. +| | Corrected field names on JVT CSR diagram, and fixed synopsis for cm.mvsa01 +|==================================================================================== + +=== Zc* Overview + +This document is in the ratified state. No changes are allowed. Any desired or needed changes can be the subject of a follow-on new extension. Ratified extensions are never revised. + +Zc* is a group of extensions which define subsets of the existing C extension (Zca, Zcd, Zcf) and new extensions which only contain 16-bit encodings. + +Zcm* all reuse the encodings for _c.fld_, _c.fsd_, _c.fldsp_, _c.fsdsp_. + +.Zc* extension overview +[width="100%",options=header,cols="3,1,1,1,1,1,1"] +|==================================================================================== +|Instruction |Zca |Zcf |Zcd |Zcb |Zcmp |Zcmt +7+|*The Zca extension is added as way to refer to instructions in the C extension that do not include the floating-point loads and stores* +|C excl. c.f* |yes | | | | | +7+|*The Zcf extension is added as a way to refer to compressed single-precision floating-point load/stores* +|c.flw | |rv32 | | | | +|c.flwsp | |rv32 | | | | +|c.fsw | |rv32 | | | | +|c.fswsp | |rv32 | | | | +7+|*The Zcd extension is added as a way to refer to compressed double-precision floating-point load/stores* +|c.fld | | |yes | | | +|c.fldsp | | |yes | | | +|c.fsd | | |yes | | | +|c.fsdsp | | |yes | | | +7+|*Simple operations for use on all architectures* +|c.lbu | | | |yes | | +|c.lh | | | |yes | | +|c.lhu | | | |yes | | +|c.sb | | | |yes | | +|c.sh | | | |yes | | +|c.zext.b | | | |yes | | +|c.sext.b | | | |yes | | +|c.zext.h | | | |yes | | +|c.sext.h | | | |yes | | +|c.zext.w | | | |yes | | +|c.mul | | | |yes | | +|c.not | | | |yes | | +7+|*PUSH/POP and double move which overlap with _c.fsdsp_. Complex operations intended for embedded CPUs* +|cm.push | | | | |yes | +|cm.pop | | | | |yes | +|cm.popret | | | | |yes | +|cm.popretz | | | | |yes | +|cm.mva01s | | | | |yes | +|cm.mvsa01 | | | | |yes | +7+|*Table jump which overlaps with _c.fsdsp_. Complex operations intended for embedded CPUs* +|cm.jt | | | | | |yes +|cm.jalt | | | | | |yes +|==================================================================================== + +[#C] +=== C + +The C extension is the superset of the following extensions: + +* Zca +* Zcf if F is specified (RV32 only) +* Zcd if D is specified + +As C defines the same instructions as Zca, Zcf and Zcd, the rule is that: + +* C always implies Zca +* C+F implies Zcf (RV32 only) +* C+D implies Zcd + +[#Zce] +=== Zce + +The Zce extension is intended to be used for microcontrollers, and includes all relevant Zc extensions. + +* Specifying Zce on RV32 without F includes Zca, Zcb, Zcmp, Zcmt +* Specifying Zce on RV32 with F includes Zca, Zcb, Zcmp, Zcmt _and_ Zcf +* Specifying Zce on RV64 always includes Zca, Zcb, Zcmp, Zcmt +** Zcf doesn't exist for RV64 + +Therefore common ISA strings can be updated as follows to include the relevant Zc extensions, for example: + +* RV32IMC becomes RV32IM_Zce +* RV32IMCF becomes RV32IMF_Zce + +[#misaC] +=== MISA.C + +MISA.C is set if the following extensions are selected: + +* Zca and not F +* Zca, Zcf and F is specified (RV32 only) +* Zca, Zcf and Zcd if D is specified (RV32 only) +** this configuration excludes Zcmp, Zcmt +* Zca, Zcd if D is specified (RV64 only) +** this configuration excludes Zcmp, Zcmt + +[#Zca] +=== Zca + +The Zca extension is added as way to refer to instructions in the C extension that do not include the floating-point loads and stores. + +Therefore it _excluded_ all 16-bit floating point loads and stores: _c.flw_, _c.flwsp_, _c.fsw_, _c.fswsp_, _c.fld_, _c.fldsp_, _c.fsd_, _c.fsdsp_. + +NOTE: the C extension only includes F/D instructions when D and F are also specified + +[#Zcf] +=== Zcf (RV32 only) + +Zcf is the existing set of compressed single precision floating point loads and stores: _c.flw_, _c.flwsp_, _c.fsw_, _c.fswsp_. + +Zcf is only relevant to RV32, it cannot be specified for RV64. + +The Zcf extension depends on the <> and F extensions. + +[#Zcd] +=== Zcd + +Zcd is the existing set of compressed double precision floating point loads and stores: _c.fld_, _c.fldsp_, _c.fsd_, _c.fsdsp_. + +The Zcd extension depends on the <> and D extensions. + +[#Zcb] +=== Zcb + +Zcb has simple code-size saving instructions which are easy to implement on all CPUs. + +All proposed encodings are currently reserved for all architectures, and have no conflicts with any existing extensions. + +NOTE: Zcb can be implemented on _any_ CPU as the instructions are 16-bit versions of existing 32-bit instructions from the application class profile. + +The Zcb extension depends on the <> extension. + +As shown on the individual instruction pages, many of the instructions in Zcb depend upon another extension being implemented. For example, _c.mul_ is only implemented if M or Zmmul is implemented, and _c.sext.b_ is only implemented if Zbb is implemented. + +The _c.mul_ encoding uses the CA register format along with other instructions such as _c.sub_, _c.xor_ etc. + +[NOTE] + + _c.sext.w_ is a pseudo-instruction for _c.addiw rd, 0_ (RV64) + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|yes +|yes +|c.lbu _rd'_, uimm(_rs1'_) +|<<#insns-c_lbu>> + +|yes +|yes +|c.lhu _rd'_, uimm(_rs1'_) +|<<#insns-c_lhu>> + +|yes +|yes +|c.lh _rd'_, uimm(_rs1'_) +|<<#insns-c_lh>> + +|yes +|yes +|c.sb _rs2'_, uimm(_rs1'_) +|<<#insns-c_sb>> + +|yes +|yes +|c.sh _rs2'_, uimm(_rs1'_) +|<<#insns-c_sh>> + +|yes +|yes +|c.zext.b _rsd'_ +|<<#insns-c_zext_b>> + +|yes +|yes +|c.sext.b _rsd'_ +|<<#insns-c_sext_b>> + +|yes +|yes +|c.zext.h _rsd'_ +|<<#insns-c_zext_h>> + +|yes +|yes +|c.sext.h _rsd'_ +|<<#insns-c_sext_h>> + +| +|yes +|c.zext.w _rsd'_ +|<<#insns-c_zext_w>> + +|yes +|yes +|c.not _rsd'_ +|<<#insns-c_not>> + +|yes +|yes +|c.mul _rsd'_, _rs2'_ +|<<#insns-c_mul>> + +|=== + +<<< + +[#Zcmp] +=== Zcmp + +The Zcmp extension is a set of instructions which may be executed as a series of existing 32-bit RISC-V instructions. + +This extension reuses some encodings from _c.fsdsp_. Therefore it is _incompatible_ with <>, + which is included when C and D extensions are both present. + +NOTE: Zcmp is primarily targeted at embedded class CPUs due to implementation complexity. Additionally, it is not compatible with architecture class profiles. + +The Zcmp extension depends on the <> extension. + +The PUSH/POP assembly syntax uses several variables, the meaning of which are: + +* _reg_list_ is a list containing 1 to 13 registers (ra and 0 to 12 s registers) +** valid values: {ra}, {ra, s0}, {ra, s0-s1}, {ra, s0-s2}, ..., {ra, s0-s8}, {ra, s0-s9}, {ra, s0-s11} +** note that {ra, s0-s10} is _not_ valid, giving 12 lists not 13 for better encoding +* _stack_adj_ is the total size of the stack frame. +** valid values vary with register list length and the specific encoding, see the instruction pages for details. + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|yes +|yes +|cm.push _{reg_list}, -stack_adj_ +|<<#insns-cm_push>> + +|yes +|yes +|cm.pop _{reg_list}, stack_adj_ +|<<#insns-cm_pop>> + +|yes +|yes +|cm.popret _{reg_list}, stack_adj_ +|<<#insns-cm_popret>> + +|yes +|yes +|cm.popretz _{reg_list}, stack_adj_ +|<<#insns-cm_popretz>> + +|yes +|yes +|cm.mva01s _rs1', rs2'_ +|<<#insns-cm_mva01s>> + +|yes +|yes +|cm.mvsa01 _r1s', r2s'_ +|<<#insns-cm_mvsa01>> + +|=== + +<<< + +[#Zcmt] +=== Zcmt + +Zcmt adds the table jump instructions and also adds the JVT CSR. The JVT CSR requires a +state enable if Smstateen is implemented. See <> for details. + +This extension reuses some encodings from _c.fsdsp_. Therefore it is _incompatible_ with <>, + which is included when C and D extensions are both present. + +NOTE: Zcmt is primarily targeted at embedded class CPUs due to implementation complexity. Additionally, it is not compatible with architecture class profiles. + +The Zcmt extension depends on the <> and Zicsr extensions. + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|yes +|yes +|cm.jt _index_ +|<<#insns-cm_jt>> + +|yes +|yes +|cm.jalt _index_ +|<<#insns-cm_jalt>> + +|=== + +[#Zc_formats] +=== Zc instruction formats + +Several instructions in this specification use the following new instruction formats. + +[%header,cols="2,3,2,1,1,1,1,1,1,1,1,1,1"] +|===================================================================== +| Format | instructions | 15:10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 +| CLB | c.lbu | funct6 3+| rs1' 2+| uimm 3+| rd' 2+| op +| CSB | c.sb | funct6 3+| rs1' 2+| uimm 3+| rs2' 2+| op +| CLH | c.lhu, c.lh | funct6 3+| rs1' | funct1 | uimm 3+| rd' 2+| op +| CSH | c.sh | funct6 3+| rs1' | funct1 | uimm 3+| rs2' 2+| op +| CU | c.[sz]ext.*, c.not | funct6 3+| rd'/rs1' 5+| funct5 2+| op +| CMMV | cm.mvsa01 cm.mva01s| funct6 3+| r1s' 2+| funct2 3+| r2s' 2+| op +| CMJT | cm.jt cm.jalt | funct6 8+| index 2+| op +| CMPP | cm.push*, cm.pop* | funct6 2+| funct2 4+| urlist 2+| spimm 2+| op +|===================================================================== + +NOTE: c.mul uses the existing CA format + +[#Zcb_instructions] +== Zcb instructions + +include::c_lbu.adoc[] +include::c_lhu.adoc[] +include::c_lh.adoc[] +include::c_sb.adoc[] +include::c_sh.adoc[] + +include::c_zext_b.adoc[] +include::c_sext_b.adoc[] +include::c_zext_h.adoc[] +include::c_sext_h.adoc[] +include::c_zext_w.adoc[] +include::c_not.adoc[] +include::c_mul.adoc[] + +include::pushpop.adoc[] +include::cm_push.adoc[] +include::cm_pop.adoc[] +include::cm_popretz.adoc[] +include::cm_popret.adoc[] +include::cm_mvsa01.adoc[] +include::cm_mva01s.adoc[] + +include::tablejump.adoc[] +include::jvt_csr.adoc[] +include::cm_jt.adoc[] +include::cm_jalt.adoc[] + -- cgit v1.1 From a209a72ac7978683f8907c23dd0138a5608ad962 Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Wed, 31 Jan 2024 10:42:36 -0500 Subject: Add zc include to unpriv. Add zc include to unpriv. --- src/riscv-unprivileged.adoc | 1 + 1 file changed, 1 insertion(+) diff --git a/src/riscv-unprivileged.adoc b/src/riscv-unprivileged.adoc index 91a7a5d..0743542 100644 --- a/src/riscv-unprivileged.adoc +++ b/src/riscv-unprivileged.adoc @@ -127,6 +127,7 @@ include::zfa.adoc[] include::ztso-st-ext.adoc[] //ztso.tex include::rv-32-64g.adoc[] +include::zc.adoc[] //gmaps.tex include::extending.adoc[] //extensions.tex -- cgit v1.1 From ad672f9e8f8d9aa8f3f65b8c6aeda9eba2e25f8a Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Wed, 31 Jan 2024 14:33:09 -0500 Subject: Pulling in the Zc chapter. Pulling in the Zc chapter. --- .gitignore | 3 + dependencies/Gemfile | 1 + src/riscv-unprivileged.adoc | 2 +- src/zc.adoc | 393 ----- src/zc/.gitignore | 4 + src/zc/Zc.adoc | 393 +++++ src/zc/Zcb_footer.adoc | 12 + src/zc/Zcf_footer.adoc | 12 + src/zc/Zcmb_footer.adoc | 12 + src/zc/Zcmd.adoc | 22 + src/zc/Zcmd.pdf | 2387 +++++++++++++++++++++++++++ src/zc/Zcmd_footer.adoc | 12 + src/zc/Zcmp_footer.adoc | 12 + src/zc/Zcmpe_footer.adoc | 12 + src/zc/Zcmt_footer.adoc | 12 + src/zc/c_lbsb_imm_offset.adoc | 8 + src/zc/c_lbu.adoc | 46 + src/zc/c_lh.adoc | 48 + src/zc/c_lhsh_imm_offset.adoc | 8 + src/zc/c_lhu.adoc | 48 + src/zc/c_mul.adoc | 48 + src/zc/c_not.adoc | 50 + src/zc/c_sb.adoc | 46 + src/zc/c_sext_b.adoc | 48 + src/zc/c_sext_h.adoc | 49 + src/zc/c_sh.adoc | 48 + src/zc/c_zca_required.adoc | Bin 0 -> 60 bytes src/zc/c_zext_b.adoc | 52 + src/zc/c_zext_h.adoc | 49 + src/zc/c_zext_w.adoc | 51 + src/zc/changes_since_v0.50.adoc | 130 ++ src/zc/cm_decbnez.adoc | 50 + src/zc/cm_jalt.adoc | 74 + src/zc/cm_jt.adoc | 74 + src/zc/cm_lb.adoc | 47 + src/zc/cm_lbsb_imm_offset.adoc | 9 + src/zc/cm_lbu.adoc | 50 + src/zc/cm_lh.adoc | 51 + src/zc/cm_lhsh_imm_offset.adoc | 9 + src/zc/cm_lhu.adoc | 51 + src/zc/cm_mva01s.adoc | 62 + src/zc/cm_mvsa01.adoc | 65 + src/zc/cm_pop.adoc | 49 + src/zc/cm_pop_popret_loads_pseudo_code.adoc | 25 + src/zc/cm_pop_pseudo_code.adoc | 7 + src/zc/cm_popret.adoc | 49 + src/zc/cm_popret_pseudo_code.adoc | 9 + src/zc/cm_popretz.adoc | 49 + src/zc/cm_popretz_pseudo_code.adoc | 14 + src/zc/cm_push.adoc | 48 + src/zc/cm_push_pseudo_code.adoc | 7 + src/zc/cm_push_stores_pseudo_code.adoc | 25 + src/zc/cm_sb.adoc | 50 + src/zc/cm_sh.adoc | 51 + src/zc/example.bib | 40 + src/zc/jvt_csr.adoc | 65 + src/zc/pushpop.adoc | 349 ++++ src/zc/pushpop_extra_info.adoc | 22 + src/zc/pushpop_vars.adoc | 91 + src/zc/readme.md | 15 + src/zc/tablejump.adoc | 49 + src/zc/variable_def.adoc | 1 + 62 files changed, 5181 insertions(+), 394 deletions(-) delete mode 100644 src/zc.adoc create mode 100644 src/zc/.gitignore create mode 100644 src/zc/Zc.adoc create mode 100644 src/zc/Zcb_footer.adoc create mode 100644 src/zc/Zcf_footer.adoc create mode 100644 src/zc/Zcmb_footer.adoc create mode 100644 src/zc/Zcmd.adoc create mode 100644 src/zc/Zcmd.pdf create mode 100644 src/zc/Zcmd_footer.adoc create mode 100644 src/zc/Zcmp_footer.adoc create mode 100644 src/zc/Zcmpe_footer.adoc create mode 100644 src/zc/Zcmt_footer.adoc create mode 100644 src/zc/c_lbsb_imm_offset.adoc create mode 100644 src/zc/c_lbu.adoc create mode 100644 src/zc/c_lh.adoc create mode 100644 src/zc/c_lhsh_imm_offset.adoc create mode 100644 src/zc/c_lhu.adoc create mode 100644 src/zc/c_mul.adoc create mode 100644 src/zc/c_not.adoc create mode 100644 src/zc/c_sb.adoc create mode 100644 src/zc/c_sext_b.adoc create mode 100644 src/zc/c_sext_h.adoc create mode 100644 src/zc/c_sh.adoc create mode 100644 src/zc/c_zca_required.adoc create mode 100644 src/zc/c_zext_b.adoc create mode 100644 src/zc/c_zext_h.adoc create mode 100644 src/zc/c_zext_w.adoc create mode 100644 src/zc/changes_since_v0.50.adoc create mode 100644 src/zc/cm_decbnez.adoc create mode 100644 src/zc/cm_jalt.adoc create mode 100644 src/zc/cm_jt.adoc create mode 100644 src/zc/cm_lb.adoc create mode 100644 src/zc/cm_lbsb_imm_offset.adoc create mode 100644 src/zc/cm_lbu.adoc create mode 100644 src/zc/cm_lh.adoc create mode 100644 src/zc/cm_lhsh_imm_offset.adoc create mode 100644 src/zc/cm_lhu.adoc create mode 100644 src/zc/cm_mva01s.adoc create mode 100644 src/zc/cm_mvsa01.adoc create mode 100644 src/zc/cm_pop.adoc create mode 100644 src/zc/cm_pop_popret_loads_pseudo_code.adoc create mode 100644 src/zc/cm_pop_pseudo_code.adoc create mode 100644 src/zc/cm_popret.adoc create mode 100644 src/zc/cm_popret_pseudo_code.adoc create mode 100644 src/zc/cm_popretz.adoc create mode 100644 src/zc/cm_popretz_pseudo_code.adoc create mode 100644 src/zc/cm_push.adoc create mode 100644 src/zc/cm_push_pseudo_code.adoc create mode 100644 src/zc/cm_push_stores_pseudo_code.adoc create mode 100644 src/zc/cm_sb.adoc create mode 100644 src/zc/cm_sh.adoc create mode 100644 src/zc/example.bib create mode 100644 src/zc/jvt_csr.adoc create mode 100644 src/zc/pushpop.adoc create mode 100644 src/zc/pushpop_extra_info.adoc create mode 100644 src/zc/pushpop_vars.adoc create mode 100644 src/zc/readme.md create mode 100644 src/zc/tablejump.adoc create mode 100644 src/zc/variable_def.adoc diff --git a/.gitignore b/.gitignore index e61db2e..0253b91 100644 --- a/.gitignore +++ b/.gitignore @@ -1,2 +1,5 @@ .DS_Store .*.swp +.vscode +src/.asciidoctor +src/diag* diff --git a/dependencies/Gemfile b/dependencies/Gemfile index 8cf7a50..f347221 100644 --- a/dependencies/Gemfile +++ b/dependencies/Gemfile @@ -2,6 +2,7 @@ source 'https://rubygems.org' gem 'asciidoctor' gem 'asciidoctor-bibtex' gem 'asciidoctor-diagram' +gem 'mathematical' gem 'asciidoctor-mathematical' gem 'asciidoctor-pdf' gem 'citeproc-ruby' diff --git a/src/riscv-unprivileged.adoc b/src/riscv-unprivileged.adoc index 0743542..4a5bab8 100644 --- a/src/riscv-unprivileged.adoc +++ b/src/riscv-unprivileged.adoc @@ -127,7 +127,7 @@ include::zfa.adoc[] include::ztso-st-ext.adoc[] //ztso.tex include::rv-32-64g.adoc[] -include::zc.adoc[] +include::zc/Zc.adoc[] //gmaps.tex include::extending.adoc[] //extensions.tex diff --git a/src/zc.adoc b/src/zc.adoc deleted file mode 100644 index 0ec1b3e..0000000 --- a/src/zc.adoc +++ /dev/null @@ -1,393 +0,0 @@ -:sectnums: -:version-label: v1.0.4-2 -:lifecycle-state: ratified - -[#Zc] -== Zc* {version-label} - -=== Change history since v0.70.1 (tagged release) - -.Change history -[width="100%",options=header] -|==================================================================================== -|Version | change -|v1.0.4-3 | Added misa.C clarification -|v1.0.4-2 | Added rule that C implies Zca, Zcf, Zcd - discussed in https://github.com/riscv/riscv-isa-manual/issues/1132 -|v1.0.4-1 | Added rule that Zcf implies F and Zcd implies D - discussed in https://github.com/riscv/riscv-code-size-reduction/issues/221 - -|v1.0.4 | Resolve https://github.com/riscv/riscv-code-size-reduction/issues/221 - Zcf doesn't exist on RV64 as it contains no instructions -|v1.0.3-1 | Replace statement about non-idempotent memory handler completing the sequence (non-normative) -|v1.0.3 | Add definition of Zce -|v1.0.2 | Fix Architecture Review Committee feedback on instruction formats -|v1.0.1 | Post public review fixes: Add instruction formats (issue 192). Clarify that Zcmt/Zcmp are for embedded CPUs (issue 190). Fix some typos. -|v1.0.0-RC5.7| Add Zcb description and fix some typos. PUBLIC REVIEW REVISION. -|v1.0.0-RC5.6| Remove Zcmpe which is _not_ frozen and is causing confusion -|v1.0.0-RC5.5| Following ARC review Adjust the split so we have 224 cm.jalt and 32 cm.jt -|v1.0.0-RC5.4| Change wording for dependencies to match arch manual "Zxxx requires Zyyy" changed to "Zxxx depends on Zyyy" -|v1.0.0-RC5.3| Add dependency on Zicsr for Zcmt -|v1.0.0-RC5.2| Adjust the split so we have 240 cm.jalt and 16 cm.jt -|v1.0.0-RC5.1| Make cm.jt/cm.jalt only valid if JVT.mode=0, and allow different behaviour in the future if JVT.mode>0 -|v1.0.0-RC5| Revert to cm.jt and cm.jalt encodings, to avoid toolchain and trace problems -|v1.0.0-RC4.1| Resolve typographical issues with the document only, no actual changes -|v1.0.0-RC4| Release candidate -| | Remove Zcmb as benefit is low. Remove cm.jalt, read LSB of jump table entry to determine whether to link -|v0.70.5 | Resolve https://github.com/riscv/riscv-code-size-reduction/issues/163 - jvt.base is WARL and fewer bits than the max can be implemented -|v0.70.4 | Clarified https://github.com/riscv/riscv-code-size-reduction/issues/159 - Need Zbb and Zba for RV64 and M/ZMmul to get _all_ of Zcb -| | Resolved https://github.com/riscv/riscv-code-size-reduction/issues/161 -| | Resolved https://github.com/riscv/riscv-code-size-reduction/issues/160 - Allocated Smstateen bit 2 and added the relevant text -|v0.70.3 | Added rule that Zcf and Zcmt imply Zca (this text was missing, this is not a spec change: https://github.com/riscv/riscv-code-size-reduction/pull/151) -| | Added that Zcf is illegal for RV64, as it contains no instructions (clarification: https://github.com/riscv/riscv-code-size-reduction/issues/149) -| | Added push/pop examples in the push/pop section -|v0.70.2 | Stylistic changes only, removing redundant text. -| | Corrected field names on JVT CSR diagram, and fixed synopsis for cm.mvsa01 -|==================================================================================== - -=== Zc* Overview - -This document is in the ratified state. No changes are allowed. Any desired or needed changes can be the subject of a follow-on new extension. Ratified extensions are never revised. - -Zc* is a group of extensions which define subsets of the existing C extension (Zca, Zcd, Zcf) and new extensions which only contain 16-bit encodings. - -Zcm* all reuse the encodings for _c.fld_, _c.fsd_, _c.fldsp_, _c.fsdsp_. - -.Zc* extension overview -[width="100%",options=header,cols="3,1,1,1,1,1,1"] -|==================================================================================== -|Instruction |Zca |Zcf |Zcd |Zcb |Zcmp |Zcmt -7+|*The Zca extension is added as way to refer to instructions in the C extension that do not include the floating-point loads and stores* -|C excl. c.f* |yes | | | | | -7+|*The Zcf extension is added as a way to refer to compressed single-precision floating-point load/stores* -|c.flw | |rv32 | | | | -|c.flwsp | |rv32 | | | | -|c.fsw | |rv32 | | | | -|c.fswsp | |rv32 | | | | -7+|*The Zcd extension is added as a way to refer to compressed double-precision floating-point load/stores* -|c.fld | | |yes | | | -|c.fldsp | | |yes | | | -|c.fsd | | |yes | | | -|c.fsdsp | | |yes | | | -7+|*Simple operations for use on all architectures* -|c.lbu | | | |yes | | -|c.lh | | | |yes | | -|c.lhu | | | |yes | | -|c.sb | | | |yes | | -|c.sh | | | |yes | | -|c.zext.b | | | |yes | | -|c.sext.b | | | |yes | | -|c.zext.h | | | |yes | | -|c.sext.h | | | |yes | | -|c.zext.w | | | |yes | | -|c.mul | | | |yes | | -|c.not | | | |yes | | -7+|*PUSH/POP and double move which overlap with _c.fsdsp_. Complex operations intended for embedded CPUs* -|cm.push | | | | |yes | -|cm.pop | | | | |yes | -|cm.popret | | | | |yes | -|cm.popretz | | | | |yes | -|cm.mva01s | | | | |yes | -|cm.mvsa01 | | | | |yes | -7+|*Table jump which overlaps with _c.fsdsp_. Complex operations intended for embedded CPUs* -|cm.jt | | | | | |yes -|cm.jalt | | | | | |yes -|==================================================================================== - -[#C] -=== C - -The C extension is the superset of the following extensions: - -* Zca -* Zcf if F is specified (RV32 only) -* Zcd if D is specified - -As C defines the same instructions as Zca, Zcf and Zcd, the rule is that: - -* C always implies Zca -* C+F implies Zcf (RV32 only) -* C+D implies Zcd - -[#Zce] -=== Zce - -The Zce extension is intended to be used for microcontrollers, and includes all relevant Zc extensions. - -* Specifying Zce on RV32 without F includes Zca, Zcb, Zcmp, Zcmt -* Specifying Zce on RV32 with F includes Zca, Zcb, Zcmp, Zcmt _and_ Zcf -* Specifying Zce on RV64 always includes Zca, Zcb, Zcmp, Zcmt -** Zcf doesn't exist for RV64 - -Therefore common ISA strings can be updated as follows to include the relevant Zc extensions, for example: - -* RV32IMC becomes RV32IM_Zce -* RV32IMCF becomes RV32IMF_Zce - -[#misaC] -=== MISA.C - -MISA.C is set if the following extensions are selected: - -* Zca and not F -* Zca, Zcf and F is specified (RV32 only) -* Zca, Zcf and Zcd if D is specified (RV32 only) -** this configuration excludes Zcmp, Zcmt -* Zca, Zcd if D is specified (RV64 only) -** this configuration excludes Zcmp, Zcmt - -[#Zca] -=== Zca - -The Zca extension is added as way to refer to instructions in the C extension that do not include the floating-point loads and stores. - -Therefore it _excluded_ all 16-bit floating point loads and stores: _c.flw_, _c.flwsp_, _c.fsw_, _c.fswsp_, _c.fld_, _c.fldsp_, _c.fsd_, _c.fsdsp_. - -NOTE: the C extension only includes F/D instructions when D and F are also specified - -[#Zcf] -=== Zcf (RV32 only) - -Zcf is the existing set of compressed single precision floating point loads and stores: _c.flw_, _c.flwsp_, _c.fsw_, _c.fswsp_. - -Zcf is only relevant to RV32, it cannot be specified for RV64. - -The Zcf extension depends on the <> and F extensions. - -[#Zcd] -=== Zcd - -Zcd is the existing set of compressed double precision floating point loads and stores: _c.fld_, _c.fldsp_, _c.fsd_, _c.fsdsp_. - -The Zcd extension depends on the <> and D extensions. - -[#Zcb] -=== Zcb - -Zcb has simple code-size saving instructions which are easy to implement on all CPUs. - -All proposed encodings are currently reserved for all architectures, and have no conflicts with any existing extensions. - -NOTE: Zcb can be implemented on _any_ CPU as the instructions are 16-bit versions of existing 32-bit instructions from the application class profile. - -The Zcb extension depends on the <> extension. - -As shown on the individual instruction pages, many of the instructions in Zcb depend upon another extension being implemented. For example, _c.mul_ is only implemented if M or Zmmul is implemented, and _c.sext.b_ is only implemented if Zbb is implemented. - -The _c.mul_ encoding uses the CA register format along with other instructions such as _c.sub_, _c.xor_ etc. - -[NOTE] - - _c.sext.w_ is a pseudo-instruction for _c.addiw rd, 0_ (RV64) - -[%header,cols="^1,^1,4,8"] -|=== -|RV32 -|RV64 -|Mnemonic -|Instruction - -|yes -|yes -|c.lbu _rd'_, uimm(_rs1'_) -|<<#insns-c_lbu>> - -|yes -|yes -|c.lhu _rd'_, uimm(_rs1'_) -|<<#insns-c_lhu>> - -|yes -|yes -|c.lh _rd'_, uimm(_rs1'_) -|<<#insns-c_lh>> - -|yes -|yes -|c.sb _rs2'_, uimm(_rs1'_) -|<<#insns-c_sb>> - -|yes -|yes -|c.sh _rs2'_, uimm(_rs1'_) -|<<#insns-c_sh>> - -|yes -|yes -|c.zext.b _rsd'_ -|<<#insns-c_zext_b>> - -|yes -|yes -|c.sext.b _rsd'_ -|<<#insns-c_sext_b>> - -|yes -|yes -|c.zext.h _rsd'_ -|<<#insns-c_zext_h>> - -|yes -|yes -|c.sext.h _rsd'_ -|<<#insns-c_sext_h>> - -| -|yes -|c.zext.w _rsd'_ -|<<#insns-c_zext_w>> - -|yes -|yes -|c.not _rsd'_ -|<<#insns-c_not>> - -|yes -|yes -|c.mul _rsd'_, _rs2'_ -|<<#insns-c_mul>> - -|=== - -<<< - -[#Zcmp] -=== Zcmp - -The Zcmp extension is a set of instructions which may be executed as a series of existing 32-bit RISC-V instructions. - -This extension reuses some encodings from _c.fsdsp_. Therefore it is _incompatible_ with <>, - which is included when C and D extensions are both present. - -NOTE: Zcmp is primarily targeted at embedded class CPUs due to implementation complexity. Additionally, it is not compatible with architecture class profiles. - -The Zcmp extension depends on the <> extension. - -The PUSH/POP assembly syntax uses several variables, the meaning of which are: - -* _reg_list_ is a list containing 1 to 13 registers (ra and 0 to 12 s registers) -** valid values: {ra}, {ra, s0}, {ra, s0-s1}, {ra, s0-s2}, ..., {ra, s0-s8}, {ra, s0-s9}, {ra, s0-s11} -** note that {ra, s0-s10} is _not_ valid, giving 12 lists not 13 for better encoding -* _stack_adj_ is the total size of the stack frame. -** valid values vary with register list length and the specific encoding, see the instruction pages for details. - -[%header,cols="^1,^1,4,8"] -|=== -|RV32 -|RV64 -|Mnemonic -|Instruction - -|yes -|yes -|cm.push _{reg_list}, -stack_adj_ -|<<#insns-cm_push>> - -|yes -|yes -|cm.pop _{reg_list}, stack_adj_ -|<<#insns-cm_pop>> - -|yes -|yes -|cm.popret _{reg_list}, stack_adj_ -|<<#insns-cm_popret>> - -|yes -|yes -|cm.popretz _{reg_list}, stack_adj_ -|<<#insns-cm_popretz>> - -|yes -|yes -|cm.mva01s _rs1', rs2'_ -|<<#insns-cm_mva01s>> - -|yes -|yes -|cm.mvsa01 _r1s', r2s'_ -|<<#insns-cm_mvsa01>> - -|=== - -<<< - -[#Zcmt] -=== Zcmt - -Zcmt adds the table jump instructions and also adds the JVT CSR. The JVT CSR requires a -state enable if Smstateen is implemented. See <> for details. - -This extension reuses some encodings from _c.fsdsp_. Therefore it is _incompatible_ with <>, - which is included when C and D extensions are both present. - -NOTE: Zcmt is primarily targeted at embedded class CPUs due to implementation complexity. Additionally, it is not compatible with architecture class profiles. - -The Zcmt extension depends on the <> and Zicsr extensions. - -[%header,cols="^1,^1,4,8"] -|=== -|RV32 -|RV64 -|Mnemonic -|Instruction - -|yes -|yes -|cm.jt _index_ -|<<#insns-cm_jt>> - -|yes -|yes -|cm.jalt _index_ -|<<#insns-cm_jalt>> - -|=== - -[#Zc_formats] -=== Zc instruction formats - -Several instructions in this specification use the following new instruction formats. - -[%header,cols="2,3,2,1,1,1,1,1,1,1,1,1,1"] -|===================================================================== -| Format | instructions | 15:10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 -| CLB | c.lbu | funct6 3+| rs1' 2+| uimm 3+| rd' 2+| op -| CSB | c.sb | funct6 3+| rs1' 2+| uimm 3+| rs2' 2+| op -| CLH | c.lhu, c.lh | funct6 3+| rs1' | funct1 | uimm 3+| rd' 2+| op -| CSH | c.sh | funct6 3+| rs1' | funct1 | uimm 3+| rs2' 2+| op -| CU | c.[sz]ext.*, c.not | funct6 3+| rd'/rs1' 5+| funct5 2+| op -| CMMV | cm.mvsa01 cm.mva01s| funct6 3+| r1s' 2+| funct2 3+| r2s' 2+| op -| CMJT | cm.jt cm.jalt | funct6 8+| index 2+| op -| CMPP | cm.push*, cm.pop* | funct6 2+| funct2 4+| urlist 2+| spimm 2+| op -|===================================================================== - -NOTE: c.mul uses the existing CA format - -[#Zcb_instructions] -== Zcb instructions - -include::c_lbu.adoc[] -include::c_lhu.adoc[] -include::c_lh.adoc[] -include::c_sb.adoc[] -include::c_sh.adoc[] - -include::c_zext_b.adoc[] -include::c_sext_b.adoc[] -include::c_zext_h.adoc[] -include::c_sext_h.adoc[] -include::c_zext_w.adoc[] -include::c_not.adoc[] -include::c_mul.adoc[] - -include::pushpop.adoc[] -include::cm_push.adoc[] -include::cm_pop.adoc[] -include::cm_popretz.adoc[] -include::cm_popret.adoc[] -include::cm_mvsa01.adoc[] -include::cm_mva01s.adoc[] - -include::tablejump.adoc[] -include::jvt_csr.adoc[] -include::cm_jt.adoc[] -include::cm_jalt.adoc[] - diff --git a/src/zc/.gitignore b/src/zc/.gitignore new file mode 100644 index 0000000..feddacc --- /dev/null +++ b/src/zc/.gitignore @@ -0,0 +1,4 @@ +*.svg +.asciidoctor/ + + diff --git a/src/zc/Zc.adoc b/src/zc/Zc.adoc new file mode 100644 index 0000000..137824f --- /dev/null +++ b/src/zc/Zc.adoc @@ -0,0 +1,393 @@ +//:sectnums: +//:version-label: v1.0.4-2 +//:lifecycle-state: ratified + +[#Zc] +== Zc* {version-label} + +=== Change history since v0.70.1 (tagged release) + +.Change history +[width="100%",options=header] +|==================================================================================== +|Version | change +|v1.0.4-3 | Added misa.C clarification +|v1.0.4-2 | Added rule that C implies Zca, Zcf, Zcd - discussed in https://github.com/riscv/riscv-isa-manual/issues/1132 +|v1.0.4-1 | Added rule that Zcf implies F and Zcd implies D - discussed in https://github.com/riscv/riscv-code-size-reduction/issues/221 + +|v1.0.4 | Resolve https://github.com/riscv/riscv-code-size-reduction/issues/221 - Zcf doesn't exist on RV64 as it contains no instructions +|v1.0.3-1 | Replace statement about non-idempotent memory handler completing the sequence (non-normative) +|v1.0.3 | Add definition of Zce +|v1.0.2 | Fix Architecture Review Committee feedback on instruction formats +|v1.0.1 | Post public review fixes: Add instruction formats (issue 192). Clarify that Zcmt/Zcmp are for embedded CPUs (issue 190). Fix some typos. +|v1.0.0-RC5.7| Add Zcb description and fix some typos. PUBLIC REVIEW REVISION. +|v1.0.0-RC5.6| Remove Zcmpe which is _not_ frozen and is causing confusion +|v1.0.0-RC5.5| Following ARC review Adjust the split so we have 224 cm.jalt and 32 cm.jt +|v1.0.0-RC5.4| Change wording for dependencies to match arch manual "Zxxx requires Zyyy" changed to "Zxxx depends on Zyyy" +|v1.0.0-RC5.3| Add dependency on Zicsr for Zcmt +|v1.0.0-RC5.2| Adjust the split so we have 240 cm.jalt and 16 cm.jt +|v1.0.0-RC5.1| Make cm.jt/cm.jalt only valid if JVT.mode=0, and allow different behaviour in the future if JVT.mode>0 +|v1.0.0-RC5| Revert to cm.jt and cm.jalt encodings, to avoid toolchain and trace problems +|v1.0.0-RC4.1| Resolve typographical issues with the document only, no actual changes +|v1.0.0-RC4| Release candidate +| | Remove Zcmb as benefit is low. Remove cm.jalt, read LSB of jump table entry to determine whether to link +|v0.70.5 | Resolve https://github.com/riscv/riscv-code-size-reduction/issues/163 - jvt.base is WARL and fewer bits than the max can be implemented +|v0.70.4 | Clarified https://github.com/riscv/riscv-code-size-reduction/issues/159 - Need Zbb and Zba for RV64 and M/ZMmul to get _all_ of Zcb +| | Resolved https://github.com/riscv/riscv-code-size-reduction/issues/161 +| | Resolved https://github.com/riscv/riscv-code-size-reduction/issues/160 - Allocated Smstateen bit 2 and added the relevant text +|v0.70.3 | Added rule that Zcf and Zcmt imply Zca (this text was missing, this is not a spec change: https://github.com/riscv/riscv-code-size-reduction/pull/151) +| | Added that Zcf is illegal for RV64, as it contains no instructions (clarification: https://github.com/riscv/riscv-code-size-reduction/issues/149) +| | Added push/pop examples in the push/pop section +|v0.70.2 | Stylistic changes only, removing redundant text. +| | Corrected field names on JVT CSR diagram, and fixed synopsis for cm.mvsa01 +|==================================================================================== + +=== Zc* Overview + +This document is in the ratified state. No changes are allowed. Any desired or needed changes can be the subject of a follow-on new extension. Ratified extensions are never revised. + +Zc* is a group of extensions which define subsets of the existing C extension (Zca, Zcd, Zcf) and new extensions which only contain 16-bit encodings. + +Zcm* all reuse the encodings for _c.fld_, _c.fsd_, _c.fldsp_, _c.fsdsp_. + +.Zc* extension overview +[width="100%",options=header,cols="3,1,1,1,1,1,1"] +|==================================================================================== +|Instruction |Zca |Zcf |Zcd |Zcb |Zcmp |Zcmt +7+|*The Zca extension is added as way to refer to instructions in the C extension that do not include the floating-point loads and stores* +|C excl. c.f* |yes | | | | | +7+|*The Zcf extension is added as a way to refer to compressed single-precision floating-point load/stores* +|c.flw | |rv32 | | | | +|c.flwsp | |rv32 | | | | +|c.fsw | |rv32 | | | | +|c.fswsp | |rv32 | | | | +7+|*The Zcd extension is added as a way to refer to compressed double-precision floating-point load/stores* +|c.fld | | |yes | | | +|c.fldsp | | |yes | | | +|c.fsd | | |yes | | | +|c.fsdsp | | |yes | | | +7+|*Simple operations for use on all architectures* +|c.lbu | | | |yes | | +|c.lh | | | |yes | | +|c.lhu | | | |yes | | +|c.sb | | | |yes | | +|c.sh | | | |yes | | +|c.zext.b | | | |yes | | +|c.sext.b | | | |yes | | +|c.zext.h | | | |yes | | +|c.sext.h | | | |yes | | +|c.zext.w | | | |yes | | +|c.mul | | | |yes | | +|c.not | | | |yes | | +7+|*PUSH/POP and double move which overlap with _c.fsdsp_. Complex operations intended for embedded CPUs* +|cm.push | | | | |yes | +|cm.pop | | | | |yes | +|cm.popret | | | | |yes | +|cm.popretz | | | | |yes | +|cm.mva01s | | | | |yes | +|cm.mvsa01 | | | | |yes | +7+|*Table jump which overlaps with _c.fsdsp_. Complex operations intended for embedded CPUs* +|cm.jt | | | | | |yes +|cm.jalt | | | | | |yes +|==================================================================================== + +[#C] +=== C + +The C extension is the superset of the following extensions: + +* Zca +* Zcf if F is specified (RV32 only) +* Zcd if D is specified + +As C defines the same instructions as Zca, Zcf and Zcd, the rule is that: + +* C always implies Zca +* C+F implies Zcf (RV32 only) +* C+D implies Zcd + +[#Zce] +=== Zce + +The Zce extension is intended to be used for microcontrollers, and includes all relevant Zc extensions. + +* Specifying Zce on RV32 without F includes Zca, Zcb, Zcmp, Zcmt +* Specifying Zce on RV32 with F includes Zca, Zcb, Zcmp, Zcmt _and_ Zcf +* Specifying Zce on RV64 always includes Zca, Zcb, Zcmp, Zcmt +** Zcf doesn't exist for RV64 + +Therefore common ISA strings can be updated as follows to include the relevant Zc extensions, for example: + +* RV32IMC becomes RV32IM_Zce +* RV32IMCF becomes RV32IMF_Zce + +[#misaC] +=== MISA.C + +MISA.C is set if the following extensions are selected: + +* Zca and not F +* Zca, Zcf and F is specified (RV32 only) +* Zca, Zcf and Zcd if D is specified (RV32 only) +** this configuration excludes Zcmp, Zcmt +* Zca, Zcd if D is specified (RV64 only) +** this configuration excludes Zcmp, Zcmt + +[#Zca] +=== Zca + +The Zca extension is added as way to refer to instructions in the C extension that do not include the floating-point loads and stores. + +Therefore it _excluded_ all 16-bit floating point loads and stores: _c.flw_, _c.flwsp_, _c.fsw_, _c.fswsp_, _c.fld_, _c.fldsp_, _c.fsd_, _c.fsdsp_. + +NOTE: the C extension only includes F/D instructions when D and F are also specified + +[#Zcf] +=== Zcf (RV32 only) + +Zcf is the existing set of compressed single precision floating point loads and stores: _c.flw_, _c.flwsp_, _c.fsw_, _c.fswsp_. + +Zcf is only relevant to RV32, it cannot be specified for RV64. + +The Zcf extension depends on the <> and F extensions. + +[#Zcd] +=== Zcd + +Zcd is the existing set of compressed double precision floating point loads and stores: _c.fld_, _c.fldsp_, _c.fsd_, _c.fsdsp_. + +The Zcd extension depends on the <> and D extensions. + +[#Zcb] +=== Zcb + +Zcb has simple code-size saving instructions which are easy to implement on all CPUs. + +All proposed encodings are currently reserved for all architectures, and have no conflicts with any existing extensions. + +NOTE: Zcb can be implemented on _any_ CPU as the instructions are 16-bit versions of existing 32-bit instructions from the application class profile. + +The Zcb extension depends on the <> extension. + +As shown on the individual instruction pages, many of the instructions in Zcb depend upon another extension being implemented. For example, _c.mul_ is only implemented if M or Zmmul is implemented, and _c.sext.b_ is only implemented if Zbb is implemented. + +The _c.mul_ encoding uses the CA register format along with other instructions such as _c.sub_, _c.xor_ etc. + +[NOTE] + + _c.sext.w_ is a pseudo-instruction for _c.addiw rd, 0_ (RV64) + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|yes +|yes +|c.lbu _rd'_, uimm(_rs1'_) +|<<#insns-c_lbu>> + +|yes +|yes +|c.lhu _rd'_, uimm(_rs1'_) +|<<#insns-c_lhu>> + +|yes +|yes +|c.lh _rd'_, uimm(_rs1'_) +|<<#insns-c_lh>> + +|yes +|yes +|c.sb _rs2'_, uimm(_rs1'_) +|<<#insns-c_sb>> + +|yes +|yes +|c.sh _rs2'_, uimm(_rs1'_) +|<<#insns-c_sh>> + +|yes +|yes +|c.zext.b _rsd'_ +|<<#insns-c_zext_b>> + +|yes +|yes +|c.sext.b _rsd'_ +|<<#insns-c_sext_b>> + +|yes +|yes +|c.zext.h _rsd'_ +|<<#insns-c_zext_h>> + +|yes +|yes +|c.sext.h _rsd'_ +|<<#insns-c_sext_h>> + +| +|yes +|c.zext.w _rsd'_ +|<<#insns-c_zext_w>> + +|yes +|yes +|c.not _rsd'_ +|<<#insns-c_not>> + +|yes +|yes +|c.mul _rsd'_, _rs2'_ +|<<#insns-c_mul>> + +|=== + +<<< + +[#Zcmp] +=== Zcmp + +The Zcmp extension is a set of instructions which may be executed as a series of existing 32-bit RISC-V instructions. + +This extension reuses some encodings from _c.fsdsp_. Therefore it is _incompatible_ with <>, + which is included when C and D extensions are both present. + +NOTE: Zcmp is primarily targeted at embedded class CPUs due to implementation complexity. Additionally, it is not compatible with architecture class profiles. + +The Zcmp extension depends on the <> extension. + +The PUSH/POP assembly syntax uses several variables, the meaning of which are: + +* _reg_list_ is a list containing 1 to 13 registers (ra and 0 to 12 s registers) +** valid values: {ra}, {ra, s0}, {ra, s0-s1}, {ra, s0-s2}, ..., {ra, s0-s8}, {ra, s0-s9}, {ra, s0-s11} +** note that {ra, s0-s10} is _not_ valid, giving 12 lists not 13 for better encoding +* _stack_adj_ is the total size of the stack frame. +** valid values vary with register list length and the specific encoding, see the instruction pages for details. + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|yes +|yes +|cm.push _{reg_list}, -stack_adj_ +|<<#insns-cm_push>> + +|yes +|yes +|cm.pop _{reg_list}, stack_adj_ +|<<#insns-cm_pop>> + +|yes +|yes +|cm.popret _{reg_list}, stack_adj_ +|<<#insns-cm_popret>> + +|yes +|yes +|cm.popretz _{reg_list}, stack_adj_ +|<<#insns-cm_popretz>> + +|yes +|yes +|cm.mva01s _rs1', rs2'_ +|<<#insns-cm_mva01s>> + +|yes +|yes +|cm.mvsa01 _r1s', r2s'_ +|<<#insns-cm_mvsa01>> + +|=== + +<<< + +[#Zcmt] +=== Zcmt + +Zcmt adds the table jump instructions and also adds the JVT CSR. The JVT CSR requires a +state enable if Smstateen is implemented. See <> for details. + +This extension reuses some encodings from _c.fsdsp_. Therefore it is _incompatible_ with <>, + which is included when C and D extensions are both present. + +NOTE: Zcmt is primarily targeted at embedded class CPUs due to implementation complexity. Additionally, it is not compatible with architecture class profiles. + +The Zcmt extension depends on the <> and Zicsr extensions. + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|yes +|yes +|cm.jt _index_ +|<<#insns-cm_jt>> + +|yes +|yes +|cm.jalt _index_ +|<<#insns-cm_jalt>> + +|=== + +[#Zc_formats] +=== Zc instruction formats + +Several instructions in this specification use the following new instruction formats. + +[%header,cols="2,3,2,1,1,1,1,1,1,1,1,1,1"] +|===================================================================== +| Format | instructions | 15:10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 +| CLB | c.lbu | funct6 3+| rs1' 2+| uimm 3+| rd' 2+| op +| CSB | c.sb | funct6 3+| rs1' 2+| uimm 3+| rs2' 2+| op +| CLH | c.lhu, c.lh | funct6 3+| rs1' | funct1 | uimm 3+| rd' 2+| op +| CSH | c.sh | funct6 3+| rs1' | funct1 | uimm 3+| rs2' 2+| op +| CU | c.[sz]ext.*, c.not | funct6 3+| rd'/rs1' 5+| funct5 2+| op +| CMMV | cm.mvsa01 cm.mva01s| funct6 3+| r1s' 2+| funct2 3+| r2s' 2+| op +| CMJT | cm.jt cm.jalt | funct6 8+| index 2+| op +| CMPP | cm.push*, cm.pop* | funct6 2+| funct2 4+| urlist 2+| spimm 2+| op +|===================================================================== + +NOTE: c.mul uses the existing CA format + +[#Zcb_instructions] +== Zcb instructions + +include::c_lbu.adoc[] +include::c_lhu.adoc[] +include::c_lh.adoc[] +include::c_sb.adoc[] +include::c_sh.adoc[] + +include::c_zext_b.adoc[] +include::c_sext_b.adoc[] +include::c_zext_h.adoc[] +include::c_sext_h.adoc[] +include::c_zext_w.adoc[] +include::c_not.adoc[] +include::c_mul.adoc[] + +include::pushpop.adoc[] +include::cm_push.adoc[] +include::cm_pop.adoc[] +include::cm_popretz.adoc[] +include::cm_popret.adoc[] +include::cm_mvsa01.adoc[] +include::cm_mva01s.adoc[] + +include::tablejump.adoc[] +include::jvt_csr.adoc[] +include::cm_jt.adoc[] +include::cm_jalt.adoc[] + diff --git a/src/zc/Zcb_footer.adoc b/src/zc/Zcb_footer.adoc new file mode 100644 index 0000000..1c8122d --- /dev/null +++ b/src/zc/Zcb_footer.adoc @@ -0,0 +1,12 @@ + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zcb (<>) +|{version-label} +|{lifecycle-state} +|=== diff --git a/src/zc/Zcf_footer.adoc b/src/zc/Zcf_footer.adoc new file mode 100644 index 0000000..62f336a --- /dev/null +++ b/src/zc/Zcf_footer.adoc @@ -0,0 +1,12 @@ + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zcf (<>) +|{version-label} +|{lifecycle-state} +|=== diff --git a/src/zc/Zcmb_footer.adoc b/src/zc/Zcmb_footer.adoc new file mode 100644 index 0000000..ac73f23 --- /dev/null +++ b/src/zc/Zcmb_footer.adoc @@ -0,0 +1,12 @@ + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zcmb (<>) +|v0.70.5 +|{lifecycle-state} +|=== diff --git a/src/zc/Zcmd.adoc b/src/zc/Zcmd.adoc new file mode 100644 index 0000000..a5ff18e --- /dev/null +++ b/src/zc/Zcmd.adoc @@ -0,0 +1,22 @@ +[#Zcmd] +==== Zcmd v0.1 + +This document is in the Development state. Assume everything can change. For more information see: +https://riscv.org/spec-state + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|✓ +|✓ +|cm.decbnez t0, imm +|<<#insns-cm_decbnez>> + +|=== + +include::cm_decbnez.adoc[] + diff --git a/src/zc/Zcmd.pdf b/src/zc/Zcmd.pdf new file mode 100644 index 0000000..9035979 --- /dev/null +++ b/src/zc/Zcmd.pdf @@ -0,0 +1,2387 @@ +%PDF-1.4 +%ÿÿÿÿ +1 0 obj +<< /Title (Untitled) +/Creator (Asciidoctor PDF 1.6.0, based on Prawn 2.4.0) +/Producer (Asciidoctor PDF 1.6.0, based on Prawn 2.4.0) +/ModDate (D:20220121110536+00'00') +/CreationDate (D:20220121110919+00'00') +>> +endobj +2 0 obj +<< /Type /Catalog +/Pages 3 0 R +/Names 9 0 R +/Outlines 24 0 R +/PageLabels 28 0 R +/PageMode /UseOutlines +/OpenAction [7 0 R /FitH 841.89] +/ViewerPreferences << /DisplayDocTitle true +>> +>> +endobj +3 0 obj +<< /Type /Pages +/Count 2 +/Kids [7 0 R 18 0 R] +>> +endobj +4 0 obj +<< /Length 2 +>> +stream +q + +endstream +endobj +5 0 obj +<< /Type /Page +/Parent 3 0 R +/MediaBox [0 0 595.28 841.89] +/CropBox [0 0 595.28 841.89] +/BleedBox [0 0 595.28 841.89] +/TrimBox [0 0 595.28 841.89] +/ArtBox [0 0 595.28 841.89] +/Contents 4 0 R +/Resources << /ProcSet [/PDF /Text /ImageB /ImageC /ImageI] +>> +>> +endobj +6 0 obj +<< /Length 5286 +>> +stream +q +/DeviceRGB cs +0.2431 0.0196 0.5569 scn +/DeviceRGB CS +0.2431 0.0196 0.5569 SCN + +BT +48.24 792.89 Td +/F1.0 13 Tf +<5a636d642076302e31> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +3.4131 Tw + +BT +48.24 768.24 Td +/F1.0 10.5 Tf +<5468697320646f63756d656e7420697320696e2074686520446576656c6f706d656e742073746174652e20417373756d652065766572797468696e672063616e206368616e67652e20466f72206d6f726520696e666f726d6174696f6e207365653a> Tj +ET + + +0.0 Tw +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2588 0.5451 0.7922 scn +0.2588 0.5451 0.7922 SCN + +BT +48.24 754.14 Td +/F1.0 10.5 Tf +<68747470733a2f2f72697363762e6f72672f737065632d7374617465> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +1.0 1.0 1.0 scn +48.24 717.09 35.6283 22.2 re +f +0.0 0.0 0.0 scn +1.0 1.0 1.0 scn +83.8683 717.09 35.6283 22.2 re +f +0.0 0.0 0.0 scn +1.0 1.0 1.0 scn +119.4966 717.09 142.5141 22.2 re +f +0.0 0.0 0.0 scn +1.0 1.0 1.0 scn +262.0107 717.09 285.0293 22.2 re +f +0.0 0.0 0.0 scn +1.0 1.0 1.0 scn +48.24 694.89 35.6283 22.2 re +f +0.0 0.0 0.0 scn +1.0 1.0 1.0 scn +83.8683 694.89 35.6283 22.2 re +f +0.0 0.0 0.0 scn +1.0 1.0 1.0 scn +119.4966 694.89 142.5141 22.2 re +f +0.0 0.0 0.0 scn +1.0 1.0 1.0 scn +262.0107 694.89 285.0293 22.2 re +f +0.0 0.0 0.0 scn +0.2 w +0.8667 0.8667 0.8667 SCN +48.24 739.29 m +83.8683 739.29 l +S +[] 0 d +1.25 w +0.8667 0.8667 0.8667 SCN +48.24 717.09 m +83.8683 717.09 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +48.24 739.39 m +48.24 716.465 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +83.8683 739.39 m +83.8683 716.465 l +S +[] 0 d +1 w +0.0 0.0 0.0 SCN +0.2196 0.2196 0.2196 scn + +BT +53.2401 725.04 Td +/F2.0 10.5 Tf +<52563332> Tj +ET + +0.0 0.0 0.0 scn +0.2 w +0.8667 0.8667 0.8667 SCN +83.8683 739.29 m +119.4966 739.29 l +S +[] 0 d +1.25 w +0.8667 0.8667 0.8667 SCN +83.8683 717.09 m +119.4966 717.09 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +83.8683 739.39 m +83.8683 716.465 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +119.4966 739.39 m +119.4966 716.465 l +S +[] 0 d +1 w +0.0 0.0 0.0 SCN +0.2196 0.2196 0.2196 scn + +BT +88.8684 725.04 Td +/F2.0 10.5 Tf +<52563634> Tj +ET + +0.0 0.0 0.0 scn +0.2 w +0.8667 0.8667 0.8667 SCN +119.4966 739.29 m +262.0107 739.29 l +S +[] 0 d +1.25 w +0.8667 0.8667 0.8667 SCN +119.4966 717.09 m +262.0107 717.09 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +119.4966 739.39 m +119.4966 716.465 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +262.0107 739.39 m +262.0107 716.465 l +S +[] 0 d +1 w +0.0 0.0 0.0 SCN +0.2196 0.2196 0.2196 scn + +BT +122.4966 725.04 Td +/F2.0 10.5 Tf +<4d6e656d6f6e6963> Tj +ET + +0.0 0.0 0.0 scn +0.2 w +0.8667 0.8667 0.8667 SCN +262.0107 739.29 m +547.04 739.29 l +S +[] 0 d +1.25 w +0.8667 0.8667 0.8667 SCN +262.0107 717.09 m +547.04 717.09 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +262.0107 739.39 m +262.0107 716.465 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +547.04 739.39 m +547.04 716.465 l +S +[] 0 d +1 w +0.0 0.0 0.0 SCN +0.2196 0.2196 0.2196 scn + +BT +265.0107 725.04 Td +/F2.0 10.5 Tf +<496e737472756374696f6e> Tj +ET + +0.0 0.0 0.0 scn +1.25 w +0.8667 0.8667 0.8667 SCN +48.24 717.09 m +83.8683 717.09 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +48.24 694.89 m +83.8683 694.89 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +48.24 717.715 m +48.24 694.79 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +83.8683 717.715 m +83.8683 694.79 l +S +[] 0 d +1 w +0.0 0.0 0.0 SCN +0.2196 0.2196 0.2196 scn + +BT +61.3041 702.21 Td +/F3.1 10.5 Tf +<21> Tj +ET + +0.0 0.0 0.0 scn +1.25 w +0.8667 0.8667 0.8667 SCN +83.8683 717.09 m +119.4966 717.09 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +83.8683 694.89 m +119.4966 694.89 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +83.8683 717.715 m +83.8683 694.79 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +119.4966 717.715 m +119.4966 694.79 l +S +[] 0 d +1 w +0.0 0.0 0.0 SCN +0.2196 0.2196 0.2196 scn + +BT +96.9324 702.21 Td +/F3.1 10.5 Tf +<21> Tj +ET + +0.0 0.0 0.0 scn +1.25 w +0.8667 0.8667 0.8667 SCN +119.4966 717.09 m +262.0107 717.09 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +119.4966 694.89 m +262.0107 694.89 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +119.4966 717.715 m +119.4966 694.79 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +262.0107 717.715 m +262.0107 694.79 l +S +[] 0 d +1 w +0.0 0.0 0.0 SCN +0.2196 0.2196 0.2196 scn + +BT +122.4966 702.84 Td +/F1.0 10.5 Tf +<636d2e646563626e657a2074302c20696d6d> Tj +ET + +0.0 0.0 0.0 scn +1.25 w +0.8667 0.8667 0.8667 SCN +262.0107 717.09 m +547.04 717.09 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +262.0107 694.89 m +547.04 694.89 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +262.0107 717.715 m +262.0107 694.79 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +547.04 717.715 m +547.04 694.79 l +S +[] 0 d +1 w +0.0 0.0 0.0 SCN +0.2196 0.2196 0.2196 scn +0.2588 0.5451 0.7922 scn +0.2588 0.5451 0.7922 SCN + +BT +265.0107 702.84 Td +/F1.0 10.5 Tf +<636d2e646563626e657a3a2044656372656d656e7420616e64206272616e63682c2031362d62697420656e636f64696e67> Tj +ET + +0.0 0.0 0.0 SCN +0.2196 0.2196 0.2196 scn +0.0 0.0 0.0 scn +q +0.0 0.0 0.0 scn +0.0 0.0 0.0 SCN +1 w +0 J +0 j +[] 0 d +/Stamp1 Do +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +BT +514.477 825.4592 Td +/F1.0 9 Tf +<7c20506167652031> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +Q +q +0.0 0.0 0.0 scn +0.0 0.0 0.0 SCN +1 w +0 J +0 j +[] 0 d +/Stamp3 Do +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +BT +456.544 16.675 Td +/F1.0 9 Tf +<5a636d642076302e31207c20a920524953432d56> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +Q +Q + +endstream +endobj +7 0 obj +<< /Type /Page +/Parent 3 0 R +/MediaBox [0 0 595.28 841.89] +/CropBox [0 0 595.28 841.89] +/BleedBox [0 0 595.28 841.89] +/TrimBox [0 0 595.28 841.89] +/ArtBox [0 0 595.28 841.89] +/Contents 6 0 R +/Resources << /ProcSet [/PDF /Text /ImageB /ImageC /ImageI] +/Font << /F1.0 12 0 R +/F2.0 14 0 R +/F3.1 15 0 R +>> +/XObject << /Stamp1 29 0 R +/Stamp3 31 0 R +>> +>> +/Annots [13 0 R 16 0 R] +>> +endobj +8 0 obj +[7 0 R /XYZ 0 841.89 null] +endobj +9 0 obj +<< /Type /Names +/Dests 10 0 R +>> +endobj +10 0 obj +<< /Names [(Zcmd) 11 0 R (__anchor-top) 8 0 R (insns-cm_decbnez) 19 0 R] +>> +endobj +11 0 obj +[7 0 R /XYZ 0 841.89 null] +endobj +12 0 obj +<< /Type /Font +/BaseFont /b0705e+CMUSansSerif +/Subtype /TrueType +/FontDescriptor 34 0 R +/FirstChar 32 +/LastChar 255 +/Widths 36 0 R +/ToUnicode 35 0 R +>> +endobj +13 0 obj +<< /Border [0 0 0] +/A << /Type /Action +/S /URI +/URI (https://riscv.org/spec-state) +>> +/Subtype /Link +/Rect [48.24 752.04 169.578 762.54] +/Type /Annot +>> +endobj +14 0 obj +<< /Type /Font +/BaseFont /e1b069+CMUSansSerif-Bold +/Subtype /TrueType +/FontDescriptor 38 0 R +/FirstChar 32 +/LastChar 255 +/Widths 40 0 R +/ToUnicode 39 0 R +>> +endobj +15 0 obj +<< /Type /Font +/BaseFont /28d5ce+mplus-1p-regular +/Subtype /TrueType +/FontDescriptor 42 0 R +/FirstChar 32 +/LastChar 255 +/Widths 44 0 R +/ToUnicode 43 0 R +>> +endobj +16 0 obj +<< /Border [0 0 0] +/Dest (insns-cm_decbnez) +/Subtype /Link +/Rect [265.0107 700.74 496.2837 711.24] +/Type /Annot +>> +endobj +17 0 obj +<< /Length 13824 +>> +stream +q +/DeviceRGB cs +0.2431 0.0196 0.5569 scn +/DeviceRGB CS +0.2431 0.0196 0.5569 SCN + +BT +48.24 787.89 Td +/F1.0 18 Tf +<636d2e646563626e657a3a205468697320697320696e2074686520> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2431 0.0196 0.5569 scn +0.2431 0.0196 0.5569 SCN + +BT +251.568 787.89 Td +/F4.0 18 Tf +<646576656c6f706d656e74> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2431 0.0196 0.5569 scn +0.2431 0.0196 0.5569 SCN + +BT +345.852 787.89 Td +/F1.0 18 Tf +<2070686173652c20666f722062656e63686d61726b696e67> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2431 0.0196 0.5569 scn +0.2431 0.0196 0.5569 SCN + +BT +48.24 766.29 Td +/F1.0 18 Tf +<616e642070726f746f747970696e67206f6e6c79> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +BT +48.24 740.64 Td +/F2.0 10.5 Tf +<53796e6f70736973> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +BT +63.24 723.54 Td +/F1.0 10.5 Tf +<44656372656d656e7420616e64206272616e63682c2031362d62697420656e636f64696e67> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +BT +48.24 697.44 Td +/F2.0 10.5 Tf +<4d6e656d6f6e6963> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +BT +63.24 680.34 Td +/F1.0 10.5 Tf +<636d2e646563626e657a20> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +BT +117.4935 680.34 Td +/F4.0 10.5 Tf +<7430> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +BT +126.534 680.34 Td +/F1.0 10.5 Tf +<2c20> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +BT +132.9495 680.34 Td +/F4.0 10.5 Tf +<6f6666736574> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +BT +48.24 654.24 Td +/F2.0 10.5 Tf +<456e636f64696e672028525633322c205256363429> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +q +48.24 604.745 m +547.04 604.745 l +547.04 648.39 l +48.24 648.39 l +h +W n +0.0 0.0 0.0 scn +0.6235 0.0 0.0 0.6235 18.16236 244.11884 cm +1.0 0.0 0.0 1.0 0.0 0.0 cm +q +q +1.0 0.0 0.0 1.0 0.5 -0.5 cm +q +q +1.0 0.0 0.0 1.0 4.0 -21.0 cm +q +0.0 0.0 0.0 SCN +1.0 w +1 J +q +48.24 648.39 m +839.24 648.39 l +S +Q +q +48.24 648.39 m +48.24 617.39 l +S +Q +q +48.24 617.39 m +839.24 617.39 l +S +Q +q +839.24 648.39 m +839.24 617.39 l +S +Q +q +790.24 648.39 m +790.24 645.39 l +S +Q +q +790.24 617.39 m +790.24 620.39 l +S +Q +q +740.24 648.39 m +740.24 617.39 l +S +Q +q +691.24 648.39 m +691.24 645.39 l +S +Q +q +691.24 617.39 m +691.24 620.39 l +S +Q +q +641.24 648.39 m +641.24 645.39 l +S +Q +q +641.24 617.39 m +641.24 620.39 l +S +Q +q +592.24 648.39 m +592.24 645.39 l +S +Q +q +592.24 617.39 m +592.24 620.39 l +S +Q +q +542.24 648.39 m +542.24 645.39 l +S +Q +q +542.24 617.39 m +542.24 620.39 l +S +Q +q +493.24 648.39 m +493.24 645.39 l +S +Q +q +493.24 617.39 m +493.24 620.39 l +S +Q +q +444.24 648.39 m +444.24 617.39 l +S +Q +q +394.24 648.39 m +394.24 617.39 l +S +Q +q +345.24 648.39 m +345.24 645.39 l +S +Q +q +345.24 617.39 m +345.24 620.39 l +S +Q +q +295.24 648.39 m +295.24 645.39 l +S +Q +q +295.24 617.39 m +295.24 620.39 l +S +Q +q +246.24 648.39 m +246.24 617.39 l +S +Q +q +196.24 648.39 m +196.24 617.39 l +S +Q +q +147.24 648.39 m +147.24 645.39 l +S +Q +q +147.24 617.39 m +147.24 620.39 l +S +Q +q +97.24 648.39 m +97.24 645.39 l +S +Q +q +97.24 617.39 m +97.24 620.39 l +S +Q +Q +q +q +Q +q +q +1.0 0.0 0.0 1.0 25.0 11.0 cm +q +q +1.0 0.0 0.0 1.0 742.0 0.0 cm +q + +BT +44.348 642.39 Td +/F5.0 14 Tf +[<30>] TJ +ET + +Q +Q +Q +q +q +1.0 0.0 0.0 1.0 692.0 0.0 cm +q + +BT +44.348 642.39 Td +/F5.0 14 Tf +[<31>] TJ +ET + +Q +Q +Q +q +q +1.0 0.0 0.0 1.0 643.0 0.0 cm +q + +BT +44.348 642.39 Td +/F5.0 14 Tf +[<32>] TJ +ET + +Q +Q +Q +q +q +1.0 0.0 0.0 1.0 396.0 0.0 cm +q + +BT +44.348 642.39 Td +/F5.0 14 Tf +[<37>] TJ +ET + +Q +Q +Q +q +q +1.0 0.0 0.0 1.0 346.0 0.0 cm +q + +BT +44.348 642.39 Td +/F5.0 14 Tf +[<38>] TJ +ET + +Q +Q +Q +q +q +1.0 0.0 0.0 1.0 297.0 0.0 cm +q + +BT +44.348 642.39 Td +/F5.0 14 Tf +[<39>] TJ +ET + +Q +Q +Q +q +q +1.0 0.0 0.0 1.0 198.0 0.0 cm +q + +BT +40.456 642.39 Td +/F5.0 14 Tf +[<3131>] TJ +ET + +Q +Q +Q +q +q +1.0 0.0 0.0 1.0 148.0 0.0 cm +q + +BT +40.456 642.39 Td +/F5.0 14 Tf +[<3132>] TJ +ET + +Q +Q +Q +q +q +1.0 0.0 0.0 1.0 99.0 0.0 cm +q + +BT +40.456 642.39 Td +/F5.0 14 Tf +[<3133>] TJ +ET + +Q +Q +Q +q +q + +BT +40.456 642.39 Td +/F5.0 14 Tf +[<3135>] TJ +ET + +Q +Q +Q +Q +q +q +1.0 0.0 0.0 1.0 25.0 -15.0 cm +q +q +q +1.0 0.0 0.0 1.0 742.0 0.0 cm +q + +BT +44.348 642.39 Td +/F5.0 14 Tf +[<30>] TJ +ET + +Q +Q +Q +q +q +1.0 0.0 0.0 1.0 692.0 0.0 cm +q + +BT +44.348 642.39 Td +/F5.0 14 Tf +[<31>] TJ +ET + +Q +Q +Q +Q +q +q +1.0 0.0 0.0 1.0 519.0 0.0 cm +q +q + +BT +4.266 642.39 Td +/F5.0 14 Tf +[<696d6d5b367c377c333a317c355d>] TJ +ET + +Q +Q +Q +Q +q +q +q +1.0 0.0 0.0 1.0 346.0 0.0 cm +q + +BT +44.348 642.39 Td +/F5.0 14 Tf +[<31>] TJ +ET + +Q +Q +Q +Q +q +q +1.0 0.0 0.0 1.0 247.0 0.0 cm +q +q + +BT +15.69 642.39 Td +/F5.0 14 Tf +[<696d6d5b347c393a385d>] TJ +ET + +Q +Q +Q +Q +q +q +q +1.0 0.0 0.0 1.0 148.0 0.0 cm +q + +BT +44.348 642.39 Td +/F5.0 14 Tf +[<31>] TJ +ET + +Q +Q +Q +Q +q +q +q +1.0 0.0 0.0 1.0 99.0 0.0 cm +q + +BT +44.348 642.39 Td +/F5.0 14 Tf +[<31>] TJ +ET + +Q +Q +Q +q +q +1.0 0.0 0.0 1.0 49.0 0.0 cm +q + +BT +44.348 642.39 Td +/F5.0 14 Tf +[<30>] TJ +ET + +Q +Q +Q +q +q + +BT +44.348 642.39 Td +/F5.0 14 Tf +[<31>] TJ +ET + +Q +Q +Q +Q +Q +q +q +1.0 0.0 0.0 1.0 25.0 -39.0 cm +q +q +q +1.0 0.0 0.0 1.0 717.0 0.0 cm +q +q + +BT +39.294 642.39 Td +/F5.0 14 Tf +[<4332>] TJ +ET + +Q +Q +Q +Q +Q +q +Q +q +Q +q +Q +q +Q +q +q +q +1.0 0.0 0.0 1.0 49.0 0.0 cm +q +q + +BT +20.632 642.39 Td +/F5.0 14 Tf +[<46554e435433>] TJ +ET + +Q +Q +Q +Q +Q +Q +Q +Q +Q +Q +Q +Q +Q +q +0.2 w +0.9333 0.9333 0.9333 SCN +103.3515 592.745 m +103.3515 570.645 l +S +Q +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +BT +60.24 577.495 Td +/F2.0 10.5 Tf +<4e4f5445> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +BT +115.3515 577.495 Td +/F1.0 10.5 Tf +<496e207468652063757272656e742070726f706f73616c206f6e6c792074302063616e2062652064656372656d656e7465642c206675747572652076657273696f6e73206d617920616c6c6f77206d6f726520726567697374657273> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +BT +48.24 547.395 Td +/F2.0 10.5 Tf +<4465736372697074696f6e> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +0.1407 Tw + +BT +63.24 530.295 Td +/F1.0 10.5 Tf +<5468697320696e737472756374696f6e2064656372656d656e747320> Tj +ET + + +0.0 Tw +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +0.1407 Tw + +BT +188.9061 530.295 Td +/F4.0 10.5 Tf +<7430> Tj +ET + + +0.0 Tw +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +0.1407 Tw + +BT +197.9466 530.295 Td +/F1.0 10.5 Tf +<2c20616e6420696e6372656d656e74732074686520504320627920746865207369676e20657874656e64656420696d6d65646961746520696620> Tj +ET + + +0.0 Tw +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +0.1407 Tw + +BT +462.4142 530.295 Td +/F4.0 10.5 Tf +<7430> Tj +ET + + +0.0 Tw +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +0.1407 Tw + +BT +471.4547 530.295 Td +/F1.0 10.5 Tf +<206973207a65726f20> Tj +ET + + +0.0 Tw +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +0.1407 Tw + +BT +506.9678 530.295 Td +/F2.0 10.5 Tf +<6166746572> Tj +ET + + +0.0 Tw +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +0.1407 Tw + +BT +529.5218 530.295 Td +/F1.0 10.5 Tf +<20746865> Tj +ET + + +0.0 Tw +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +BT +63.24 516.195 Td +/F1.0 10.5 Tf +<64656372656d656e742e> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +BT +48.24 490.095 Td +/F2.0 10.5 Tf +<50726572657175697369746573> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +BT +63.24 472.995 Td +/F1.0 10.5 Tf +<43206f72205a6361> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +BT +48.24 446.895 Td +/F2.0 10.5 Tf +<33322d626974206571756976616c656e74> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +BT +63.24 429.795 Td +/F1.0 10.5 Tf +<4e6f6e65> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +BT +48.24 403.695 Td +/F2.0 10.5 Tf +<4f7065726174696f6e> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +q +0.9569 0.9569 0.9843 scn +51.24 397.845 m +544.04 397.845 l +545.6969 397.845 547.04 396.5019 547.04 394.845 c +547.04 299.095 l +547.04 297.4381 545.6969 296.095 544.04 296.095 c +51.24 296.095 l +49.5831 296.095 48.24 297.4381 48.24 299.095 c +48.24 394.845 l +48.24 396.5019 49.5831 397.845 51.24 397.845 c +h +f +0.8 0.8 0.8 SCN +0.2 w +51.24 397.845 m +544.04 397.845 l +545.6969 397.845 547.04 396.5019 547.04 394.845 c +547.04 299.095 l +547.04 297.4381 545.6969 296.095 544.04 296.095 c +51.24 296.095 l +49.5831 296.095 48.24 297.4381 48.24 299.095 c +48.24 394.845 l +48.24 396.5019 49.5831 397.845 51.24 397.845 c +h +S +Q +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +BT +59.24 374.745 Td +/F6.0 11 Tf +<2f2f54686973206973206e6f74205341494c2c20697427732070736575646f2d636f64652e20546865205341494c206861736e2774206265656e207772697474656e207965742e> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +BT +59.24 342.845 Td +/F6.0 11 Tf +<7430203d20353b> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +BT +59.24 326.895 Td +/F6.0 11 Tf +<5828743029203d205828743029202d313b> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +BT +59.24 310.945 Td +/F6.0 11 Tf +<6966202858287430293d3d30292050432b3d7365787428696d6d293b20656c73652050432b3d323b> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +BT +48.24 272.845 Td +/F2.0 10.5 Tf +<496e636c7564656420696e> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +1.0 1.0 1.0 scn +48.24 244.795 249.4 22.2 re +f +0.0 0.0 0.0 scn +1.0 1.0 1.0 scn +297.64 244.795 124.7 22.2 re +f +0.0 0.0 0.0 scn +1.0 1.0 1.0 scn +422.34 244.795 124.7 22.2 re +f +0.0 0.0 0.0 scn +1.0 1.0 1.0 scn +48.24 222.595 249.4 22.2 re +f +0.0 0.0 0.0 scn +1.0 1.0 1.0 scn +297.64 222.595 124.7 22.2 re +f +0.0 0.0 0.0 scn +1.0 1.0 1.0 scn +422.34 222.595 124.7 22.2 re +f +0.0 0.0 0.0 scn +0.2 w +0.8667 0.8667 0.8667 SCN +48.24 266.995 m +297.64 266.995 l +S +[] 0 d +1.25 w +0.8667 0.8667 0.8667 SCN +48.24 244.795 m +297.64 244.795 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +48.24 267.095 m +48.24 244.17 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +297.64 267.095 m +297.64 244.17 l +S +[] 0 d +1 w +0.0 0.0 0.0 SCN +0.2196 0.2196 0.2196 scn + +BT +51.24 252.745 Td +/F2.0 10.5 Tf +<457874656e73696f6e> Tj +ET + +0.0 0.0 0.0 scn +0.2 w +0.8667 0.8667 0.8667 SCN +297.64 266.995 m +422.34 266.995 l +S +[] 0 d +1.25 w +0.8667 0.8667 0.8667 SCN +297.64 244.795 m +422.34 244.795 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +297.64 267.095 m +297.64 244.17 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +422.34 267.095 m +422.34 244.17 l +S +[] 0 d +1 w +0.0 0.0 0.0 SCN +0.2196 0.2196 0.2196 scn + +BT +300.64 252.745 Td +/F2.0 10.5 Tf +<4d696e696d756d2076657273696f6e> Tj +ET + +0.0 0.0 0.0 scn +0.2 w +0.8667 0.8667 0.8667 SCN +422.34 266.995 m +547.04 266.995 l +S +[] 0 d +1.25 w +0.8667 0.8667 0.8667 SCN +422.34 244.795 m +547.04 244.795 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +422.34 267.095 m +422.34 244.17 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +547.04 267.095 m +547.04 244.17 l +S +[] 0 d +1 w +0.0 0.0 0.0 SCN +0.2196 0.2196 0.2196 scn + +BT +425.34 252.745 Td +/F2.0 10.5 Tf +<4c6966656379636c65207374617465> Tj +ET + +0.0 0.0 0.0 scn +1.25 w +0.8667 0.8667 0.8667 SCN +48.24 244.795 m +297.64 244.795 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +48.24 222.595 m +297.64 222.595 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +48.24 245.42 m +48.24 222.495 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +297.64 245.42 m +297.64 222.495 l +S +[] 0 d +1 w +0.0 0.0 0.0 SCN +0.2196 0.2196 0.2196 scn + +BT +51.24 230.545 Td +/F1.0 10.5 Tf +<5a636d642028> Tj +ET + +0.2588 0.5451 0.7922 scn +0.2588 0.5451 0.7922 SCN + +BT +83.664 230.545 Td +/F1.0 10.5 Tf +<5a636d642076302e31> Tj +ET + +0.0 0.0 0.0 SCN +0.2196 0.2196 0.2196 scn + +BT +130.263 230.545 Td +/F1.0 10.5 Tf +<29> Tj +ET + +0.0 0.0 0.0 scn +1.25 w +0.8667 0.8667 0.8667 SCN +297.64 244.795 m +422.34 244.795 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +297.64 222.595 m +422.34 222.595 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +297.64 245.42 m +297.64 222.495 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +422.34 245.42 m +422.34 222.495 l +S +[] 0 d +1 w +0.0 0.0 0.0 SCN +0.2196 0.2196 0.2196 scn + +BT +300.64 230.545 Td +/F1.0 10.5 Tf +<302e31> Tj +ET + +0.0 0.0 0.0 scn +1.25 w +0.8667 0.8667 0.8667 SCN +422.34 244.795 m +547.04 244.795 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +422.34 222.595 m +547.04 222.595 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +422.34 245.42 m +422.34 222.495 l +S +[] 0 d +0.2 w +0.8667 0.8667 0.8667 SCN +547.04 245.42 m +547.04 222.495 l +S +[] 0 d +1 w +0.0 0.0 0.0 SCN +0.2196 0.2196 0.2196 scn + +BT +425.34 230.545 Td +/F1.0 10.5 Tf +<446576656c6f706d656e74> Tj +ET + +0.0 0.0 0.0 scn +q +0.0 0.0 0.0 scn +0.0 0.0 0.0 SCN +1 w +0 J +0 j +[] 0 d +/Stamp2 Do +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +BT +49.24 825.4592 Td +/F1.0 9 Tf +<636d2e646563626e657a3a205468697320697320696e2074686520> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +BT +150.904 825.4592 Td +/F4.0 9 Tf +<646576656c6f706d656e74> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +BT +198.046 825.4592 Td +/F1.0 9 Tf +<2070686173652c20666f722062656e63686d61726b696e6720616e642070726f746f747970696e67206f6e6c79207c20506167652032> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +Q +q +0.0 0.0 0.0 scn +0.0 0.0 0.0 SCN +1 w +0 J +0 j +[] 0 d +/Stamp4 Do +0.2196 0.2196 0.2196 scn +0.2196 0.2196 0.2196 SCN + +BT +49.24 16.675 Td +/F1.0 9 Tf +<5a636d642076302e31207c20a920524953432d56> Tj +ET + +0.0 0.0 0.0 SCN +0.0 0.0 0.0 scn +Q +Q + +endstream +endobj +18 0 obj +<< /Type /Page +/Parent 3 0 R +/MediaBox [0 0 595.28 841.89] +/CropBox [0 0 595.28 841.89] +/BleedBox [0 0 595.28 841.89] +/TrimBox [0 0 595.28 841.89] +/ArtBox [0 0 595.28 841.89] +/Contents 17 0 R +/Resources << /ProcSet [/PDF /Text /ImageB /ImageC /ImageI] +/Font << /F1.0 12 0 R +/F4.0 20 0 R +/F2.0 14 0 R +/F5.0 21 0 R +/F6.0 22 0 R +>> +/XObject << /Stamp2 30 0 R +/Stamp4 32 0 R +>> +>> +/Annots [23 0 R] +>> +endobj +19 0 obj +[18 0 R /XYZ 0 841.89 null] +endobj +20 0 obj +<< /Type /Font +/BaseFont /e7a97f+CMUSansSerif-Oblique +/Subtype /TrueType +/FontDescriptor 46 0 R +/FirstChar 32 +/LastChar 255 +/Widths 48 0 R +/ToUnicode 47 0 R +>> +endobj +21 0 obj +<< /Type /Font +/Subtype /Type1 +/BaseFont /Helvetica +/Encoding /WinAnsiEncoding +>> +endobj +22 0 obj +<< /Type /Font +/BaseFont /8a6373+CMUTypewriter-Light +/Subtype /TrueType +/FontDescriptor 50 0 R +/FirstChar 32 +/LastChar 255 +/Widths 52 0 R +/ToUnicode 51 0 R +>> +endobj +23 0 obj +<< /Border [0 0 0] +/Dest (Zcmd) +/Subtype /Link +/Rect [83.664 228.445 130.263 238.945] +/Type /Annot +>> +endobj +24 0 obj +<< /Type /Outlines +/Count 3 +/First 25 0 R +/Last 27 0 R +>> +endobj +25 0 obj +<< /Title +/Parent 24 0 R +/Count 0 +/Next 26 0 R +/Dest [7 0 R /XYZ 0 841.89 null] +>> +endobj +26 0 obj +<< /Title +/Parent 24 0 R +/Count 0 +/Next 27 0 R +/Prev 25 0 R +/Dest [7 0 R /XYZ 0 841.89 null] +>> +endobj +27 0 obj +<< /Title +/Parent 24 0 R +/Count 0 +/Prev 26 0 R +/Dest [18 0 R /XYZ 0 841.89 null] +>> +endobj +28 0 obj +<< /Nums [0 << /P (1) +>> 1 << /P (2) +>>] +>> +endobj +29 0 obj +<< /Type /XObject +/Subtype /Form +/BBox [0 0 595.28 841.89] +/Length 166 +>> +stream +q +/DeviceRGB cs +0.0 0.0 0.0 scn +/DeviceRGB CS +0.0 0.0 0.0 SCN +1 w +0 J +0 j +[] 0 d +q +0.35 w +/DeviceRGB CS +0.8667 0.8667 0.8667 SCN +48.24 810.69 m +547.04 810.69 l +S +Q +Q + +endstream +endobj +30 0 obj +<< /Type /XObject +/Subtype /Form +/BBox [0 0 595.28 841.89] +/Length 166 +>> +stream +q +/DeviceRGB cs +0.0 0.0 0.0 scn +/DeviceRGB CS +0.0 0.0 0.0 SCN +1 w +0 J +0 j +[] 0 d +q +0.35 w +/DeviceRGB CS +0.8667 0.8667 0.8667 SCN +48.24 810.69 m +547.04 810.69 l +S +Q +Q + +endstream +endobj +31 0 obj +<< /Type /XObject +/Subtype /Form +/BBox [0 0 595.28 841.89] +/Length 162 +>> +stream +q +/DeviceRGB cs +0.0 0.0 0.0 scn +/DeviceRGB CS +0.0 0.0 0.0 SCN +1 w +0 J +0 j +[] 0 d +q +0.25 w +/DeviceRGB CS +0.8667 0.8667 0.8667 SCN +48.24 30.0 m +547.04 30.0 l +S +Q +Q + +endstream +endobj +32 0 obj +<< /Type /XObject +/Subtype /Form +/BBox [0 0 595.28 841.89] +/Length 162 +>> +stream +q +/DeviceRGB cs +0.0 0.0 0.0 scn +/DeviceRGB CS +0.0 0.0 0.0 SCN +1 w +0 J +0 j +[] 0 d +q +0.25 w +/DeviceRGB CS +0.8667 0.8667 0.8667 SCN +48.24 30.0 m +547.04 30.0 l +S +Q +Q + +endstream +endobj +33 0 obj +<< /Length1 11676 +/Length 7751 +/Filter [/FlateDecode] +>> +stream +xœz t×yæ½3ƒ`ðšÁƒ$A$@Aâ[¤Ä‡(JÔƒ¤©MBc‰dDJ~¤²[²d:Ží6ÍI·©Ó‡MZ{3ÒršØ ã¸[×묷N–i·NâíIrÊÛ‰·n6òZàþ÷HQ›4{ZH3sïûø_÷ÿ¿ÿFYшEÃûFâ‰#]|„…ÖÃSg'çg¹Ñ>„˜ËÐ昺°¨ADˆßïûOΟ:ûW{!ösÐÇpê̽'ìÛØïaeútvrúÒþBß·àj< Ƴì Ìå‚zù鳋÷d樫PÿÞ™¹©É7>ûw 0þ ¨Ÿ9;yÏ<>Îà}Ô•ÙɳÙñ¿ý/0wäB†çæçùJþ§ÕJðþSóç²ó_/šLÀ»?†úŸ ÂZNžyõµGŽÛwþ *eÿ‰´üçãÏ­ßzæÁYªFÄ ýãØËÓ¶_Š¶¤Ðg .2’Ô]d(~di@??€'b–˜ uú¿ê1¬Ã žE¿ñ7|²gu }™U6šÐ«ìeô¡²õ–ùñ^˜ç +†+‹Ð ¨‹ôùj‚k\ã…g5igþù™K¨‹yg2è´c^ò%¸†+Šj˜/¡0ô³1S0çÛP7 ýøEh3 0<Ëp=òÃ|~¨£Ÿ£~¼ŽF๟õ¡aÒï‚tÌa¸.Aßä íß9ÿÙà‰H/û"t$›&ãÑ;¨J\—°Šüo–Ëöߦ:!g࣠Œ™õ µh›QȤâ@När!”½úûbŸ?P,UB¨,\^QAU;ª£5µ±xªÿÿ/ë—ø·tþ·üTT£"ç€Uû/Œ©(ÜîUùèhëm»8¦|OÅΘ·VÅ5ÊÿP-ÑZ•©80Ú ÕªlÍŒWQ;†GCjÇX­ÊÕ¡¡pè¾Ñø¿3æ‡~£7ýïŽùÃ!ÕU{/ŒÑcc0Ÿ¡Æ:1^«ò5Z¾«+×&&ü*‚i„­œ6ul5k²’‰×ª¦å"Yä˜FQÙŠ¾°¢r•ý*]Ê.M*¤Ðä…ÆüK´v@¯‘E:É/…`Fsò&eÇR£ÄU!:1ª(»Ã½“SF•é;õ)H?+Y–V–”ÝK½“á%e)L— “ÉÕè ü‘µ#K*0ÆFWj]ó†B~em ăú€šCÚB´›½&¬¬+£#þŠÇF—€¡¾ðRXYê[ +O’úò¨U%¢Ð-HÁñÿ0°DáÉØÎ ê¬&–®±õO‡—UÝé_«;ø*Þ¸¢âÇЀjÕ0þô˜ÖK¤¥J`®Px`¬¸šEª ô¨lt—**=9aŒ‰ÒŠ *bq¡b„Ši P1“n{ô +V-…ÁVhµ<ÇD5¶÷TmJÏ×åÒÚj±C[Ófu hÜhÝ8{ò%æ‹Œ>¦ OØ„zÑ€k5¢*´ 6VÚ »ý0xŽ$x¸ZTŽFQ3Š¢8 + lÏajú +ÿ^Aï`/nÄ£øwñ¯˜:æ5v;Ì®rî w™ûcî#Ã!Çwñ§ù…á‹F£qÈøçÆMm¢MOŠ«â[æ!ógÍoYvZÎXÞ³ÖX/XW¬Ø>cûÐÞdµÿRrH ðÎ"™÷˜›à±yðädGï¡œh@QÕ’Ô°°®¢¸&ÀØÀªWÑÚ +gE +Uù„ÊI9d“¢ª=±ÂêòšÚèÓ8ËUNåbÕ²ª Ò Nµ¯j¼ñªA5Æ8•_ÕlÒ ƒ*®ª¼´làÎ(¼4ÂdËÎâŒH³i«Y€Š`'Í¢´lmÐ,Úho³Þš­[ÍÐÛJz£À)šÌV›=Vøá ÄIš-Ö͵ÇQ]}ZÉ&ì ™XË"ó㛣ؕ'ÌTæßÍ¿‹Øù>v‰Ì?ß´žÄ.ì¼ù¸ŸÊ¿}¾‡‰W<¼Áã½Ì{ =r+‘’&XÖUÛ™_J'ywK¥8\9üͶ¦âc+mßü÷ýÅÎê§~‘ÿî3eÏÁi˜£æ鎸ʯiv˜ÃIæð´q©†IÉ’ö)™?j÷øZWW? T=ñù±£zlÒx6ÿ—ùžE„¦¡t £U>¾‚ÍHä¢X5ÆU¼ÊÓ˜œKh¬e}Å`E.PžiM3`Ù¡ +X2™Jºí8œ>qï¨öìµ]¼HçÑ·»|öBÄ6ÀDÈ…U.®!]aíÈs(ë©[Äâ·ZZNÜ M,Š¡€(¡ÊËE·—ÉDD~@¹ô‰1'<s4œ¾Î¼ vÚ„¨Eê&Ç°`rLLeW5³pŒ‹YfXÑL5¿U*hÜÓÆÉD‘Û%`>\VÙdÂ;Å"‹CÈ¿lqYÿ øØ¥Ù*¸?úxÐA€ÊŽŸáט/¢صÍ(çjµra=gâ 'Ì7Pšƒ@oöÄ)ªIPN­i;‚²Cóš2D¤ÎTC›†•ƒl vÙX±iWM&Ú˜VÜcÊlŒ{+š«J•@Cs4ÀJ%ÍU¡` ÙRrø*‡§ªº¸8Zƒn”#ÍÇ»;í{mï'†Ý•;'ºÛíf[´÷wöwg3a³;µ0Ò2xºÎᨛÐu0·//ðR9#áÄ»Ûȱ„ÌJÊ*(R\§v,‚EP;Ö "›AFP0g7‰tRNºC©¤Ìß³Ûy1ýb SÝÜuó÷[ZÞ£¯·nß‚õÂèU”³“õ,æõC+Ö—ÃvÆ:/«>}-kB­ µ¯ª–˜j\Õ$ç Õ½ªZÀXÀ-,‹ä®JÒ²,¹¡ê w'½»È=í$ž„y›ìȨðß™Q]™Œ#uSu˜DI6šN—;¶å :DÑòk­ºÍh%aÙ‘3øP&C8×ÇƼPü§Û„TCetgÇ;êŸ\ΖŽ×uES=ÍCXí!çL²Åd„š›ö0¸û³ßqítï?•ŽD[ûOÍõµI±¾K®•žê¢Ú#5 w´›ºoÀ/9*!!I w@Ð òƒ@.2×ÁYÐ3º×9Éd2Ç(…“+2?èyλԹ™qè½U¿åA@×i,„AÈÌ»lœ€“é¤;ÍŠxñé7ðO1Æ,6àü§þ›ç Å6ÑÆdp»—5Ž10sóUÜE÷nÍÆ¿à'ðw ~GÐ(ç"Z-Þ2¿ˆàºÃ†b»5ºQ†@ÿ*º…Vس°Ë`#\‰E3;  §f±BÐ+“s¢½˜„?§¸CZ© „ïË–5aQ P·dT»¬ZA%iS•·‚Þà×h!žnZÍ`&{ÔC·5£–ʪ‘†{VwL²ŽBaFÜ)È!þƒtôI‡¯«T0žÚÝ¢±;ùµ!BKŽÊ³¡œ‰È“ÄyRºü„.E÷þ@qý&b:n0"ÏКf é$‘§ßE$f¶/%•Í¿Í"åF‰Y„O÷ö îëé¿sè`²ºÓ­T3Þx,ðƒ©kiÜy3ÖÐþè±±ß7® £¢—Ð]t?2tAÆØrFs[Ô®¸E aËMâk.‚Í—‚@­@*eTQVíÕM¼s ãq‡‚xSŽédŒÉ–Ý»?ؘ.g󯱇[ùÁÇ?1ð+¾4RjKvýãÔ5gÍhkÉÀ±tWò±'#ãÑ1`j|óÐq»ž]äL LÍF¬HTIê{] ªnÏšÊ$Ç­«r‚*݆A‚¼…¦B)ð•©­œ/ÁIw|³¿g»+ª£•e-ŸÀ¶ô¢œ˜2wËUÉk-ùûñƒ€´ü üüרrÑ~ô]”k!’ +ClÄTÕ„9©íTïH@ª£¥ 亰ÅJa +Vâªiœb´uHY“µz(˜¤E?Žj—r¦0…bh-–r=MPŽBypM«7 ¾¢`ê`F푯Âå±T‹…ÄÁbDz½Èæ¢m¢FŽ´–1h†ø•É¨‡Æ)Pë’Éø=ŽeÞTTL[dÍæ!òñ4¤A:éTADà{<Ô6.Ù-˜¨Voã ê #Švº +Aëï(. ïHÆ™þs°„ÝcŠL쪉›½Ê@¸¾¯^4<8qô“½]¥íã¿ß^RR®ßQ–&×îÎÊòªÚŽ6íž1¥í,gL5vÞ,²»B¹6ì’›‡†2¾ +{´¬³Ux¤.SŸ‹œk¬Àþ󢽺7S-Ém~ÌUðYª'¡²­ c uš•Áˆ8HÒ­¹|›÷ +²n°”jÇ%xx`çÁÊF¿Âfg$ªWõŸv{CÉ°7ÿ6.–]EòxÇ/äŠýàžfþ#äŠIš+6 Œ¿ôŽ½ßzÍèrál Ém«œÉ‘N°˜( q–»Ê©,¤ˆÜ*§‰>ˆ°†UÍDá9^‚ã9µ„˜Op9/ ҚȀ8_âÏoÏ oq9èÜ#{^ÿÒ7|~¸òýí³ùŸ¿}àÓÔßî‡ûçæmy¿-ï°®SR·òŽýÌ›7cºn˜×`ïT“Š¤ ðJƒMɤV ;Ø;8J'ª0 µ|c)(H?©z F—¢ ’Ù@výȪfM+­Öµ”N°¹Wª~S³øåÞýû‹C !ªZÆèuûœVPî65£Mû½¨ç–ýQEâä–õs^9/0·Ýú¼”ÛíÎýëv¤V¤vn;uEJ}¥ï6³#~=~ý¿ƒ_‘íÒãušÔ¯; =E”3`b$ †‡²ò'3ÈÏÄ’ä$±G™­( [Ÿ†Í­üäø±û/« š?t¨)}øÙGŸ<1uùÊÙÑñ;ŽSPD±F+~dC°ÆH‡Ž‚×­ÀKï#ÒpßBî[HÃaÖ‘†—" ¬ËÌ£ËliÈ·\’·cÀG{Ææú[3UMcƒ;Í.ÙnM–ý6ó@}ËÒÑS—3ïã΢ƃ»²¬ ˜Œ"O>5S»Ç½”þÍ|^((ÖHR"»N´•*V‚ cÜ®Ä6&í¶±%x¯¥l¯!Æ/;ÃU¸ËƒÉÿJbèê4èª F¹táœ3çÛ:8ËÄU÷ Q)綀@Pn†&7±tœ!nëy“ì+‹¥‰KO8´€B¤dJë9SäLî’yçs¨š3ùBÒÃ4k"g26™(*$Mvú-L‘¹Ø(•Ñr—Ø#ïh·…zêöô›±›u +Ñiw™c•©®ò$¼ˆ–íÂVÞ"Ùj_qˆgmk{ܳ#Z\Ú…MŒ‘‘ã +o¨D6Iµžš?Ý?Á Ó‚œä¬Wbt§‡xƒâp®¸ê !Þd%_@,H3“|Þ–Q‘ŽŸI˜Oº!²»]ü‘ðpïc5VǤÝ{@|¿ÿþ–_ µT>€­úÞ…µWðKȃvëç¾: âÅpÅÞÍ“~8™~I¢Ð²ö‰ràÌSDD>Ô$eÓ©M´c‡‡pi]C;‡‡wݬv›ñ}¸;¯…ƒérщ½ù7ZLi‚Üh2¡}4IHÑOÒBÈ2ÍzZhÚžšiZh¢L /CògÚ:(À¦­œÏ9Ÿ3Ɇm{F ÆËýÏŽâ—>¸¯$‰GóÝ…µ™?‚µ]€vh6¤Z“9D–7˜Éòn²“cM²~Þœt¹7WÛâaI3{jw_˜5|¬oéÈ.E`NïÃ/å_?Ò†÷~pq=ÏwÿÏ…Ædþ˨ðçøuX߆͛™‰ÅFNQ9â0ØBn ÛNÒ… P!©lâ Õ(i‚x”u"%9‘`%ôSRd$-cH¢x±Ì°äÔ”~Ñ3;£ô%O^‚óY6ÙŒ¤¼ÉKÒÝJºÃÔFòÞJÞç`®m9,ŸÉÁ0rnk39XŠœ€\ÇŒbñæÇ=Üa¿uPKÎvl±Û…]ŸNE’Hp·ÂôË/Ž,á+#_}uä³Oü˜ùþ3Ï|ÿäÅuÛÙHRÛ)&'QºÙ€ÈMh2A¢[¹:*¦‚+^Õp˜Î&Žà(’n¨®UfÅä**f +i8ÂÆ[Õ­ÃÙ¢ï?,k»n‰úÉnë'ñéƆˆm_G‡™gø{ðE;CÊå÷Ž3à—~ï‹ÑH/î¹¹Þ]gì”#»dÑd7ñÿ“À‹eóä~T AƒˆÇÝv‚Ž™«`÷œŠWÉé¹A5®²`÷9{ÄŒ`ÜÜ +äà@?5 Ç~,ˆ³ùÿý7îØ<†wVå¿ŒÛÉÚ6ôó.Pѯ‘h­ðýñÖgHòNnH@°)»PUõûlIÉGKˆ ³(]øk%; —vô0ʵúS@ à¦NDÅ\4'¼„k|ÅE›°ÚAꊤ/B2âôíëjBÊ)^)ºbÒ_u HÈŽ¬.®$\ÑÜ +i©VPHsFË/`$yMJeF?IS+@;ùRBN +éy;KQ|Ö³qä)œž [ó¸p0Ÿ~ä§Wî|áÚ¥Ÿ\¹ü“;/½yïï|ïâàÅ/}#ÿþ·¾ù™žë4: ¦z¬w=öè|E“Gá£Íé'^žÁeï_~䕳Oþ¯«×ðƒÿèâ¥^É>þ§oÜõ”w„±@bϳÌ}û†>aµ8{<ÒmaMÐû6¬ÉþV¬ þ¤Æõ¡¯ ?ÃF!ß|É—Z-<~ãyÂEßøWÆÆèß38Ñ϶þ`)ºé¨OŒÊ @T(³€JÛ eúÌÊ(ß](ó0ç•­¿˜*CŸ+”­è ôµBÙ†|ø˜ s&¨ÆV(cÀ7 +eYw¡Ì¢ƒL¨PæP€9_( üd¡lA»˜ç +e+~Šy¿P¶¡îñî¹Ù Ùs‹ÙiåÎ{•ÎÙésÙ{•Ã1exrvî‚ròÜÜYå`vT997»¸Ð½÷229» ŒdÏ͜ܛž9¶^ôÎ;•U±:¥Y¹½4Ô7Õ6Ö&êê2·¿9œ=·037«ÔÅÒ0îκt]*….¤ípzqq¾9Ÿ:[{~vfjn:[˜;n*{’,›Í.ÆžžYPÈúÊÈÜÉÅ»'Ïeh833•]vÎÏNgÏ)‹§³ÊHß ²o>;«wÔ;Ô(›$ÔÇêc"¬0–L357?“,ÎÊÂç”»gOÃdð†ˆB™Ÿœºk˜ž™UöõÆïY¬'g§ÉÈÉ3 sÊä…É™3“wžÉê'•ÞÎýÊäb³R`kaêÜÌüâBlaæL ŠÃ$ÿúÔ éÒ,º€²è8ò,š†­p'ºîÐ> ­YZ; °TdZç ¿‚N»9tJ¡Ï(m!s-¢˜u/:-#´ÿ-‘f Ï^ºÊ :c{ #Èó:o”€•êàÙ ×o›GïQ€¶5•€Qu°i~ۘô´e²®ýc(]Xï4бˆæaÖ8ü›êjÆYè;½§ad FÎAÛ9hÉRnušcÐ+ cã ‰ÓП¬¹ÉY}ú.ÂV¤ÒT +=ÎЙ³”F]îç©Ä … +ô?MûŽ€³„ç> ,Ki¾5óàm3ÔP-ÝÎ_=ÐF.qe·¯»I áqÊ:%‹P;Ey:] çnxGj:eú˜“[”ÌoSè.¸Ÿ*Ì9KiîcÐã¸j€ŠIÊáæš“@ ‘()]€kê“`}gè·Vœ¤\w‚»&åEª÷Ûµµ« ÏSë‹Q œ§®¡x’Ïê‹éoã¡åo;SÔG›éߊIôï•œ€jÝ€ŒJP9ª„±˜¯GIèÙæ–A­-÷Àô»ÑPo?€…†`Áa`ñ¨ü(òl©14¹ì:ŠŽ£GÑ÷Ñ·ÿ/ë©iˆ +endstream +endobj +34 0 obj +<< /Type /FontDescriptor +/FontName /b0705e+CMUSansSerif +/FontFile2 33 0 R +/FontBBox [-1135 -354 1481 1191] +/Flags 4 +/StemV 0 +/ItalicAngle 0 +/Ascent 800 +/Descent -200 +/CapHeight 800 +/XHeight 0 +>> +endobj +35 0 obj +<< /Length 1278 +/Filter [/FlateDecode] +>> +stream +xœe×ËnÛF†á½®BËtHs&Ã@‘n¼èu{stÔ’ + ß}ù½¤ikÀÆ/‰œy¾_Ã!}øôôÓÓùtß~»]ês¿ïÇéÜnýíòåVû¾ô—Óygì¾êýë+þÖ×|ݶ“Ÿßßîýõé<.û‡‡Ýá÷í÷ûí}ÿáÇv)ý‡Ýá×[ë·ÓùeÿáÏOÏÛëç/×ë_ýµŸïûãîñqßúØú9_ɯ}à´Omûütÿ¸óϼ_ûÞòÚLL½´þv͵ßòù¥ïŽÇLJ1wýÜþó‘9ç)eÔÏù6=n?[i(JKiU:J§ÒSz•2¨Œ”Qe¢L*ÊEåJ¹ªÌ”Ye¡,*+eUÙ(›ÊNÙUÊ-уÁkä5x¼¯‘×à5ò¼F^ƒ×Èkðy ^#¯Ákä5x¼¯‘×à5ò¼F^ƒ×Èkðy ^#¯Åkåµx­¼¯•×âµòZ¼V^‹×ÊkñZy-^+¯Åkåµx­¼¯•×âµòZ¼V^‹×ÊkñZy-^+¯Ãëäux¼¯“×áuò:¼N^‡×Éëð:y^'¯Ãëäux¼¯“×áuò:¼N^‡×Éëð:y^'¯Çëåõx½¼¯—×ãõòz¼^^×Ëëñzy=^/¯Çëåõx½¼¯—×ãõòz¼^^×Ëëñzy=^/oÀä xƒ¼o7à ò¼AÞ€7ÈðyÞ oÀä xƒ¼o7à ò¼AÞ€7ÈðyÞ oÄåx£¼o”7âÝþj·ùº«üo—‰$‰JI•$’$*I$IT’H’¨$‘$QI"I¢’D’D%‰$‰JI•$’$*I$IT’D’¤$‰$III’’$’$u>áMò&¼IÞ„7É›ð&yÞ$o›äMx“¼ o’7áMò&¼IÞ„7É›ð&y¼U†o•aÁ[5ñ‚·j¶oS orÁÛ4Û‚·sÞ®>,x»Â/x;³áíê·31ÞÎÄx;ãí +¿âíJ¼âíJ¼â특âíâ¬x»¯x»¯x‡+Þ!Êw¹âB®x‡b®x‡+ÞïrÅ;àà2äÍkçÎœñJ¼Y³e¼Y†Œ7+|œÆx‹ oSŠŒ7+[Æ›ețךyÞ¢oVûòæµ– -ã­Œ‹·é€"¯åþR¦W*–Ó4Xq”êC™^Jú[(£à®^ƒ1»¢y]•¡Èk¹}¼YM-x e#e®ÊÙ_y+ë¡hÜŠ7k¶:û«/ âåæZå5«dUýµl¼uz5n¥¿[ç¾ÛeBø÷&Si|y»(ô%q& %+S%ñ*Aª¾ÛÊBijPUÃ6\¥h,ž(¶+L¥ºÝ,¥‚4G©5Þ<¥ô…ÂcD› [˧©ñ¡*t› [ jj¼aijü¼IµÙxʪw¹g¶FÉ›wžØ¶¥§Rô>¶Þís¡hâ>½:­O¯VRÇËí¾³P*#àÙi|e0¼Uýí,ì*o§¿Uáûì/ãÎþ*[§¿Uê,”¦Nö¹°•m€lšx°:š&\}M³¹»)üÒðL2„´\%CHËF7æÕ§‰ÇÊ»BŽLÉ…’)æjæ€Æ»J1:%¦H߯:=Së©ÿÛ³zýr»méükÀó¹žÌOçþí¿‡ë媳ôû7±Ùð +endstream +endobj +36 0 obj +[333 760 760 760 760 760 760 760 389 389 760 760 278 333 278 500 500 500 500 760 760 760 500 760 760 760 278 760 760 760 760 760 760 667 760 639 722 760 569 760 760 278 760 760 760 760 708 760 639 760 646 555 680 760 667 760 760 760 611 760 760 760 760 760 760 480 517 444 517 444 305 500 517 239 760 489 239 794 517 500 517 760 342 383 361 517 461 683 461 461 435 760 278 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 1111 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760] +endobj +37 0 obj +<< /Length1 12740 +/Length 8658 +/Filter [/FlateDecode] +>> +stream +xœ½{ip×yà{}Ó=ÝÓsŸ=3ÀÌ`ŽžÁÌàÁ â%‘E +0)‹ I $DS¶åCÊŠ¥8ëŠílâ-ÇåU¼þᬻ”'Y/âÍf]Î:v9YT|eíÄ®]x-Å»²bs«î÷Þ Êñ&Uù±v÷û^¿~ï»×M„Bz?bÑáCGëÍ™SÏüo„ð ô?synño÷ !æ$ôÎ\_²å/É'áþ©s‹ç/_äj"ÄÌðç/=uN¼üò‰%„.ÌÏý»k'wÁØïÀ1z:Äï3€¹‚\¸¼tãäÛ¥›»æÒ™¹WžûâÇaþ¯ÆUáþ8Àö•¹Ëó¿õî KU´¸pm‰ÛËË Œn,^_|ÏXå»ñ7þ{DhCË­W^yñ û'Þ@öž?}|9¸yUî ìuæ.Œ•ƒz?xŽýĺ‹¾Ì~¡{Á^ïÍt߯B{*è6’á O8He¾¼äÆÎüâ/·àŠÐK½+þÆ°ƒ$ýÒßás{Î"þžbí{ã€Ã'ða{ó.ós¤à˜ç3è“xØÇÓ?ÒpÔØ8:Á<ƒNà=pÝN%'HÌ€>rœAÃÌwQŽ£ÁsÃh¯¡¸à{(£ ÌE÷Ð8ìi´îm¥kÀX:n žùÊࣹ‡Šbý#M9Ù㜚Â/§÷þß›à Ç“– JÀâûŠÚ›OÓý†‰V…Þ¼ŽDcñD¥Jgìl.?0X(–ÐPùŸ^ûÿÏÏEUYÜøáwÿõYå·G]¡2³m–ö==kÿ¥‹-'ZsqÕþ–ë«Ô\¦zàá™=ùÙlÍe«£¶;ux&ëNÍÖ\®JÍæ³ïšùn⫳ 7s7ñêl"ŸuùÊŒ;}}–Þ˜…ùøªvòm5W¨z9ü<¬n?òdÂE0Xõh×Ôf—T ˜v§^såªý4YäO`Ûe÷åm—+ìwÑá™[ó·ælÒOd³³‰[z¸‘•vFÂÈÂŒjÕþ%ÇWµë®X99cÛä§çÞaÏØgO÷¦ ã4²2,mß²¸5=—¿eßÊÓåòdrw +F}¤Ãš'<£Ó•¶­F³Ù„½z ØílŽõqËÒaþjÞ^í/ž·gMd]<;s Ú—¿•·oí»•Ÿ#ô!—šk1o“@_ à¹äçÞqê~JÈ£Vˆ¸u“°mÿÙü-ѵÏL$Vfk ¿ïý ÿ:àJ‡g<Œ?8ëMn¹(Bðah¼6TœA.oïqÙÊnW²÷tùI¦B)ÔÄ}@&Öú€€¼¿¨(õ>à@µû€€ÏêØÕû Ы7˜ŠÇN_Р1´ šöž?@úûî¡Ížé1™Ð›=<8³΂7Ývïí`ç_a<´9…BGÐqô8º„žG7лáüú EGÑô^ôZDKèzý*D©«è4ºŽN èô´íBÓh:‡fÐcè,øã&ÌùIø[Å“ø]ø“L€y™ù9{‹ýçp§¸òùO ŒpZø+ñ£âªôÛò€ü¯åŸ(§”ßWã>ÅwÊ÷A­¤ýKí'ú9ýþœÿ“þuã”ñ×æ@<¹‚æ5ˆ,„<½†º +‡*®¯åaiÍEuO„‹ÔÄ®QwÑêmÎT®â +M—3ºH7*®¿y›íuš«îhÜã|79—sx×·â‰Á;œë_ñé&ïJç ++ž¼Ã»ÊŠ+˼ Y¸)ÁdË<ç³*<é–7»ED?éVŒeUÑ¡[Ñéhµ7ºµÍn­‘Ñèó/È*xS§ÿßçEIVTŸ¶ÑãNÅ1j ™YsPÆVVfYl*ÌîÎààúóLaýÕõW±…­ÿ…ƒ +óú]íbëîgá|~ýÇ0æ?®ÿ˜)÷~잀/1¯¡$ +@ÐO.yþÀš›^%óc£c£[q3ÔÙ`š +8W(:øØ—8lhéPª\N‘#ÖýɇOÞÁ£Ÿá°®ñ»ÏþÞ_~æÌ®ѧÄú ¬3 ë<ëØè2¬“­»ÂªÕ×Üåº*ßtUÇ•W¼ zÇ ¯¸ª±¬¨²UqƒÆr(¶*]8ãÍ ºètá&\Ð+ª¬Cá>›ÞõYiO²#m‡)(!á¡Bu,æs“õX=Õ1ëÏøôd¬±½øØ‘´Ï½*ä†'æÿÍ•qÛ”9Ùþ+Ÿ>ýÑWLŠ˜ðíø½ßDßDϢ踨ßæ4$sìÆ Ý`Ó“õ5O6ÖÜĪ'#3à†;„¡#mŠE +XàæH{,Ÿ+õ]Òª<'Ib0hûT5|0¨ÉÚq-:-ñ"ÖøG%DÖõ¡?Æ L´}¹JÝcø5r¸Œ]­î!\¹Í[H}Ö©Û=éa"8_íϪ\8ð_â ˆº? +ó90é«ÌWÁêÛ¨gPÅ3µµ®Fä ûÖ°»£î:@hD¸éî\EÞ°i–‘<0:é4†-`o«†8?nµš@Y>›~a£c“ÒH»ÏéLžt9œ£¦«5.>kZíì±ðù¯Gjµõ— „±,<ÃjY-¦èü–ÑBË2NFÐaƹ¤3”~n.92ØI =ÄÔ5) )†¦Š/òE}ÐÊŲ¶9ú`i¸$J:#D…'ürî­áŸ}{ÑyÔÝNè“}kÝ&¡¯ ®ÝÎXÛ›ZÅËÈ@꾺ë_õ&¬5¢’éÍûÁò <­ãŽ›î–òäí¢Žk™?Ñ鸙t»s9_qÆ)KÆc£Ûq3[ÍIvnëá:ÉMâMvFÚ“0‚h£ìò¹¢#8@./Ô$KA•iL•ý +Ãó>%( g*¹-é('•+óÛÚ‡Êqñ‰[ϾãÒ1Fæ†Åû¶ª<.¢¥xPœÞ&¼À3 ƘÛß.—â™øVI®7cÍRøüTñù'/mgD–çx_È ùÊ¢AðàT ºpèv:—4êMÓ¾57n´Pwó«n²éùüT#Š RX2СÚ=fõ† ´ážâQ¯± ƒ +à±–)¤}«’ê3Wkša2ÑPÕÉVþF3­ ®þ'v0ŒoéŠà¬/²áo&.W«—Ùà^<3xÖîýˆÊr¢Ì¨›#Ò¬ƒ4y"M ¤Û‘ãAš1¤¹›º²2H3ÒtËàí;FÅKL÷¬z2è°ÅÇr 07ez¦ÒÑ¥@¨1Ó…VÀÓýªä@hy„Ø­Ÿq@|¢‚ž‘œ²søH0Í€À…ö6ìðùœPÙÿžìVð˜gÃÏØá‡Î¶‹ÑÂø¥ÆFΔñK‚YÊø£¦&ñDP¼b¦¬v >ó±‘ý…÷9ªÀbì“?Mìœßbh"]î<\ho±y"ä€:ÄeâÁŠS'É,‘1ßYZàÆQ×$< +ªk]–ð(¯c)c‚À‰ p%\Q ]Aæƒ H…Ò$nƒè&õQ°ëI æ Ž”-æÅåŸl}ìÌÎrû'ö:? G_\ž:Í1|¤zlâC¸ÆÌ6ôêgg{nwQ ì ùöyµ1q¤1ȤˆM2Ë m#})&´¾}Úšë3n›q*H“x Å7‚4¨¾QôF[YÐ/âl@«Š!˜8žÐ`¸ï€ð‰H0âÊZùÃç°ìÓM¹¶þF­F ´>^='ž(:,# +¯àV8ê0³õÚÝk‰¼Ì±ØÚXƒ ò†Ål’ßå`)ìF7ÖqcTSËjÇdê¸Gh0Þ®ž>É€o­R7ýuf&©ÜýË8ªn|¸·TÕŸCôeÔµÜp«G’HÚB—jB¸j‚N$A',ho¥±¹^¾éÖ· yPâ„èeQ€À¼,‘³[7–+õ2€Uz®Ñ³CÎ]öfÀv¥N’Vµã:·ÖASŠ ;uÀߨÑÅðÚ„v†ò:ì3 +Ùg²Ï0dŸÒ +³,J‘(åŒ'FîKýÚ4Á&Ä‘í47ÃôzÂ9i§u'š­:'쌾þš-`@4q÷u&®·v÷åZ,Îhw_sÀoLÜûø¯"ÐnÔˆ6ÊZ—!Vi“ðXÜTG t$xsÐ.­z ü¦ÀjÐäušÄc8¬`šo5Çh($žQÔ™Ð>„¬{ëŸý –ôTi{éÐp$©²= x¤™JsøQ|}ýÖúoà…ØÀT},ŽLj—Ähn[¹e‡É·oÙðw ¼®¢ýØE=IâA8'n$LЯQ¶§{‰L8?h:àù*iä)[);Ó ñí:ß^¶H4qÀŠ'€«_/×öGÆ#öhuG>ù¾ZK3Ÿ+uR~GêÁØaŽùM‹Q’á±Ó•=ÕŒ_Âípa¼ÃŠV‘¡‘ˆ4¸ÿK€¿ Qmu3D[Ìž¶x% !’Ë°@B„„´2•@ HÈ6Ý  ª<‘ÐM7⸡Ï¿ãj+,êú´Qœe¸FúV„¼‰×Ò†… ¡ìp„ú­"SÌ›A6жRì¸-,ÆS°+¿Ù!ɶ§eãF-TÇ®C}ÄÒ$»ÈF@Ñ +cÁ0¸ mXgÓÎŽÙÙd¥8•IÇò0C€çö»úX­¶vÎì:ôl!“›ÊT'ìŠ[lKUš¡x³|Ñ0y^€ +Ç>~dýN-j´NŽ§œÊðC¬h³€Ø÷£2߀Z¾æP7JxBR>oxB̤«Æ4‰¡Üˆƒ›nR†èŽB/©óƒ @&¤Ô„ Ñ^GÓõr•M¯Kò¸¦õEš©ŒEÒ"2m’gC¤œ°Õêöƒ¿> î˜.ooJ¦å8/°+ +$´’] ù0£KÉzÙa®‡£Õ;¥›åçO5CC.IœÈrL$9’Iñl:ßL„òÄ €|t` £n‰—!Ò5y3Î +¡±Q‹“‚_Êؽê¶MË"b!¯sà!Ò,`Íö8Òsv?}ô€Ä[âp$ÞJ-3jh(50a' ºeøà¾Ù×~ð³!^}¨$É8šn¤"VÌn¤ÂV$õªÞ–tµ@äBð|äE ’I‘Z†º./r!j—%8—‰\†7c# Œ9Rõ&•KsÕËÅ@G­K53c‚L¬ŽÇB2êò„"’NGF¨)âžöI9ÚG(õË9(*`~evÏxµŽð5gPžØõ›ƒê Œ +üÏŽaIÒý!5S «Ìß½çoTý¦v÷Ç®üZy8™NBiò†(H<N§chÃÿÈ!>r®!MÒ(U90ôš Á?MÌp€¨B@L7ÝA‹ªQ™UÚôD3â6 C¤ +Ó’­EÂM„ +Š8\yúý‹ÇjVuýÇÜ»«ûªSžzúÏ|-ž^{×Á{«••Êc†Ïûýš\}ϧ¶ì z“ÓÓÌw‘ƒ¦Q·H}8`êFZ^‡ ¶ÙuO ©×I^Âö<ä\ À9ÀÓÁÓU;=ÿ7Ò†ê2Cp†î …{ûI`ý4¢©™ÏejÊ3ûC,ãッ”ÊŽ…C|8Ðd¤ Y…_j`æ°Ÿã·Å¬€áé]a„²¾¬Ah²ü>|±(z¯ÉñE;Ðaôvô·¨û áz òLÒûà×¼´Ýlºº·$Q|äA$Q$„=^wåUïHÙ«æí~dq·ÒtwÌ\Ý=btåGÁ7 kËû•*·GzO­z;90)F‡À¤¼GPs:E’Pì3»!+Šê ‚Ò¦íUÚÝ$õ˜è¸šÞd×n-à2·hº\Ç}$ÙP…<}ÂìÊùAâŠuPødŽ–I‘öX˜i…9Jä-«%¢ë?¯BØ ø¨¦´Òaûë„2u#¾0ÄjcðØ~¢ˆôèÄ‹²“¶^ãÌø‚å‰rÈÇ0¼)Ä3• Š¿·÷Üh‚f¡}¤õß¿Ç L¨gcõB9šð…#©4êåeÛ䉜îí‘°Hã#ˆ¤+:†dùõ Ñ„ýDÓ 5…>5 –IÌD²MïÉ41Ëz‰ÙIR–ÜGÙ3°Hc—&f!ª²V$͵šcÄõœOÔÊi÷¡ò43ÛÖŠDê ¼ èzPÝšŒŸSLCñs’øfØÉ”Ÿ«îÚ IÙàï%ƒ¸)ÊóÕ¡£¾0<'3¼€z¹Î6ü:Кƒè +¹NŠÐ:¸A«",­Ó¨ +dæ®<Ð8DhŒC6nþ5@3Aí<6EHL‰ $'$j› ‚i÷vŒhÚ¹ oÏŽiÆ@“òë'}!ËïW|¼ŒËÍ2Ök[v&‡;C‰æcUUmd~øÌ —9 ׫wÿ”JWjr¦ù¶­ŸŠëúÎÔPÏÇá³@Wèêyc?¡ˆky!ßæ¦R°Wšn 2èÍ=h’&“¬’†-â‰Ék;ø'Ð×g´ ˆ…² MdptwÑ5‘†Œ\*&ŸÏ¨$üΔ¡IF![’߸*B¶ãkfµØgCš"÷_ÈÁ^i’݇p] ±ˆfe‰…´ê®¹z;ß ÝÀrS1*·+=° ‚ v"p só¶¤e ‰Ä•€Ë1HNoÓ k.sfŠFi+àÆÞÜ4ÀdÓÀÁtÏ`¬¦û4þB² +õ$Ÿ¸Ž «òƒ{ë5NÅŸŽOmñË +hYv²²üñ°•ËŸzäÂŽir;#ƒ…‘dÜ/øwŽ¾Äᨠ)BÇŸH4âQÓ''· >²«A£пÄ<õàTO>›µ€«´Hjí HTë ’Ëu'jP”è[ä•¡>pS›Ù-m]ýw?U¬¢@Ó×\ò*(T6j'až™á`89Áˆ*Ë<ÝÙ–CUHåŽ×jÇ ]×ý_æ ~¤˜OêšUܛ޶Öo€÷OñW福%zU [–R?}æã­ÅÔ‹¼ßPÀZ,¨ +òMªm@€¥€­"$# $ä;›1¸nn NÞÜaúN+B=÷ÙqØ­U'f0—K;NT#Õ*®Õ‡r»êv;À¾ˆ­j@Ó×ÿž3¸jjëÞïûý‚(—×_¬T³a+^Ë  +À4Øè7P7¶‘O{œFPÏÖûïÆ3é›nÆqÓ+ž¿ã*+ž¢ÞôDE’8W]ñBÐ\áÉ÷’¨X•e™œ»Ð¾o Cî )YVÔ`(¥û¶*~IçææeIàQš÷DÈîÝRgŠl!65#:Ëž:­7~ƒa„kr"ñÕy~QÂÿ.©DDuý[¸ZSEI¬<‘B>?~[mÐþM }ýVÿ}D¹ÕÕˆÚ¥€ø±º;N‰Ç‰›œ‹7±â5í;nnÅÅÆ2ƒÉ¾KÏ1rv›Æò@3Tíû¨f;`i]¸C¾ø<K䚘yó³Š_ìé½dÐF@=oI¾ƒJ·WJ‘Ï!DA¤§tÇ´Pìy£˜>œ ƒ£¹¥¦†EI‘$â@R;¬˜Ñá”ú«ªó¤œnEãÃIë¤ +gÂbp¢”lIÚ˺Ä\“ —dˆ"ú'ý­_Wåpìp¹r€ò,}¯Iõ¥>Žz&êoAìóÕ^Ü콯qèûgŲw\qÅ+U‡%Ç­‚®ØDWÜ’±¦•É›™`¨\uê‚xÿÖþåÝ=¦1¤\‘3v§Ó÷๙Èèæ÷#à’EjýÅÂvÜ¢ïJr4)Ë øáq‘{@Wõx2Ï}L·,n× 2 ¯Ž¤¡Œ¥ `m¼'c/åϨÚþ‹‚¬&Dßú¶A}Iã§,¸¯É;o‰œ¬à›ÁÞç}úÑæGŒ´ñA#Fzÿ‹Q{9Ôé·Y¨¶õÛJ¢'ûmƨßPýÎæW”9ôù~[C_Cßì·uÇŸ†Ù0G>•<Ž¿Ôoc”d¢ý6ƒ$¦Ýo³h‘™è·94Î|¦ßæaüúmÚÍ¢~[Ãgw÷Û:js³káÊõù«KógíÓOÙ;®œ½:ÿ”}ܱÏ]Y¸nŸ»ºpÙ~d~Æ>·peéÚ®ÙGç®\³Î_½xnçÂ¥³ÓÐ=½põü¼Ýtöû­#l2ÄÞsciþÊY˜‹=<^­5Î?6îøüÕk®Ø g æœ>ÝíTà2ž¯‘á––·Ôëg.מ¼rñÌÂÙyçÚ“WÏÌŸ#Ø8Wæ—ê\¸xÍ&ÚGÎ-½sîê¼ —.ž™¿r °yÖºj/]˜·î;hZœ¿Ò|°7 joà1ì ; +¬ÿ,™æÌÂâE˜diáü4}ÐYº±TU殜%OÎ]º¶`Ï]Ÿ»xiîô¥ùÞƒsöôŽ#öÜÒ»OÖµ3W/..]s®]¼äAu˜äÿ}í‚ ä +ºŽæÑU´gòÍñiôœw@ÿYè§ÐqȤltÍAÑ9¸·€.Cë3C{È\KPûíB¢cÐs”Ž¿F[d…‹0f'Œº3O÷G“ëUtîÛ9àgl¨¡ít{síA7(æÛþäéa°©øùÌÙ€¿Î?{¾ãtÔ5Gðµa.õñ¼#—Ð"¬X‡¿3ÀXï{F“çxrú®BÏ<åPVFÍóuàÞOðÙàÁlÆ.¡w®W)gz#.Ñ™ç)þ=ZŸìãyÚäûÈyúô>t®‡³yŠó›3|Ë U*Ù·Ò7 ¸‘C¹³·®» ¡qÚ=L–:OiºÐÇçp@=ÌzÏœÛÄdh;ƒž€óùþœW(ÎÓ€£#ˆª€Å¥pcÍ9À…p”´®Ãqà9ÐØKtŽ7Wœ£Tï@Gh{‰êÄ[¥u V'Ò_¤ëP\‚kOBõ>&ÿœgúÿàÞ³ô;Ôø«PŸ¯ÒoG Ba̤ˆJ@oôjžk¡ÐÞ1ÐÛÝ ‘Óè´¤ºf?Vx= R~ôù8zlo@'ÿ/Ã@f +endstream +endobj +38 0 obj +<< /Type /FontDescriptor +/FontName /e1b069+CMUSansSerif-Bold +/FontFile2 37 0 R +/FontBBox [-1331 -440 1929 1170] +/Flags 4 +/StemV 0 +/ItalicAngle 0 +/Ascent 800 +/Descent -200 +/CapHeight 800 +/XHeight 0 +>> +endobj +39 0 obj +<< /Length 1278 +/Filter [/FlateDecode] +>> +stream +xœe×ËnÛF†á½®BËtHs&Ã@‘n¼èu{stÔ’ + ß}ù½¤ikÀÆ/‰œy¾_Ã!}øôôÓÓùtß~»]ês¿ïÇéÜnýíòåVû¾ô—Óygì¾êýë+þÖ×|ݶ“Ÿßßîýõé<.û‡‡Ýá÷í÷ûí}ÿáÇv)ý‡Ýá×[ë·ÓùeÿáÏOÏÛëç/×ë_ýµŸïûãîñqßúØú9_ɯ}à´Omûütÿ¸óϼ_ûÞòÚLL½´þv͵ßòù¥ïŽÇLJ1wýÜþó‘9ç)eÔÏù6=n?[i(JKiU:J§ÒSz•2¨Œ”Qe¢L*ÊEåJ¹ªÌ”Ye¡,*+eUÙ(›ÊNÙUÊ-уÁkä5x¼¯‘×à5ò¼F^ƒ×Èkðy ^#¯Ákä5x¼¯‘×à5ò¼F^ƒ×Èkðy ^#¯Åkåµx­¼¯•×âµòZ¼V^‹×ÊkñZy-^+¯Åkåµx­¼¯•×âµòZ¼V^‹×ÊkñZy-^+¯Ãëäux¼¯“×áuò:¼N^‡×Éëð:y^'¯Ãëäux¼¯“×áuò:¼N^‡×Éëð:y^'¯Çëåõx½¼¯—×ãõòz¼^^×Ëëñzy=^/¯Çëåõx½¼¯—×ãõòz¼^^×Ëëñzy=^/oÀä xƒ¼o7à ò¼AÞ€7ÈðyÞ oÀä xƒ¼o7à ò¼AÞ€7ÈðyÞ oÄåx£¼o”7âÝþj·ùº«üo—‰$‰JI•$’$*I$IT’H’¨$‘$QI"I¢’D’D%‰$‰JI•$’$*I$IT’D’¤$‰$III’’$’$u>áMò&¼IÞ„7É›ð&yÞ$o›äMx“¼ o’7áMò&¼IÞ„7É›ð&y¼U†o•aÁ[5ñ‚·j¶oS orÁÛ4Û‚·sÞ®>,x»Â/x;³áíê·31ÞÎÄx;ãí +¿âíJ¼âíJ¼â특âíâ¬x»¯x»¯x‡+Þ!Êw¹âB®x‡b®x‡+ÞïrÅ;àà2äÍkçÎœñJ¼Y³e¼Y†Œ7+|œÆx‹ oSŠŒ7+[Æ›ețךyÞ¢oVûòæµ– -ã­Œ‹·é€"¯åþR¦W*–Ó4Xq”êC™^Jú[(£à®^ƒ1»¢y]•¡Èk¹}¼YM-x e#e®ÊÙ_y+ë¡hÜŠ7k¶:û«/ âåæZå5«dUýµl¼uz5n¥¿[ç¾ÛeBø÷&Si|y»(ô%q& %+S%ñ*Aª¾ÛÊBijPUÃ6\¥h,ž(¶+L¥ºÝ,¥‚4G©5Þ<¥ô…ÂcD› [˧©ñ¡*t› [ jj¼aijü¼IµÙxʪw¹g¶FÉ›wžØ¶¥§Rô>¶Þís¡hâ>½:­O¯VRÇËí¾³P*#àÙi|e0¼Uýí,ì*o§¿Uáûì/ãÎþ*[§¿Uê,”¦Nö¹°•m€lšx°:š&\}M³¹»)üÒðL2„´\%CHËF7æÕ§‰ÇÊ»BŽLÉ…’)æjæ€Æ»J1:%¦H߯:=Së©ÿÛ³zýr»méükÀó¹žÌOçþí¿‡ë媳ôû7±Ùð +endstream +endobj +40 0 obj +[367 760 760 760 760 760 760 760 428 428 760 760 305 367 760 760 760 760 550 550 550 760 550 760 760 760 760 760 760 760 760 760 760 760 760 760 794 642 760 760 760 325 760 760 580 978 794 794 703 760 703 611 733 760 733 760 760 760 760 760 760 760 760 760 760 525 561 489 561 511 336 550 760 255 760 760 255 866 561 550 561 561 372 422 404 561 500 760 500 500 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760] +endobj +41 0 obj +<< /Length1 2224 +/Length 1128 +/Filter [/FlateDecode] +>> +stream +xœÕ–_hEÇ3{i.M{iK´:gã]rwm"A0mÚSJóÇä %Xe¹ÎÝmr·»îiÚ±þCÔ–‚)¡/"}¨/bA!èCÔAú >øоT|*ÍùÝݹk¢–<;ag>¿ù}ç÷ûÍÎÍbDÔA¯F£“3Ùý_¯±g0;[¬é61¾ öe<]eݵëu()þüñru±ôå…gˆøuØÇ*R?qûÌåÝàïð U0ÑzKÁÚì‡+5ïdüú ö‡AÖªUÔi'bý×A¼š~Ò¦º/laê59úýÔDì–ã¶åzõ7( é[ßv¤}õìk@¾Œ®•‚½°Žøõ=Öï/Þ÷ä_¼]»‰úöåt¶9&mŽä£FúXnm™æBý{Ú\i}ÓÂ~B½AãMO¬ƒßf種·ùç°ß‰Fö eoáí-œsóÖ(ßÝ6Þ/Þ ¤ö|Ý£¹XŽ®K|“’ì¡p\ WFk;€[hÓÆøæšÿw»ÊêDï¾àNü=Ž“Á ³ÇøN ïz8—Ø«%’‰ƒ×®%Y[’»ó&_ºS +Þä + mDäºvtsm¯6ÌÎó+·R«Wf. eƒlkïÚêÚ +;ôâ?Ên4O'¢˜Ñ6XsdP¬Q/ )ŽAó‚âê¤Å[¨›·£îsŠ·Cõ‰âb~1ºûé Å uÿ¨˜Cõ³b «W£›ê늡gÝŠ9íaB±F]ì©9º${Vq wsêf§k”`ç£Û€n»¤˜Qœ}¦õ°eÅí`?„C×Ã~UŒøœsêâmŠ5êá÷-†s¡C|X1£Ýü´bNü}ÅMñKŠcÐü©¸…z´mŠÛé Ö§x;mÕ'¨SóÇ,{Ñ1ʯ·Ø'²ƒ™¬ÀÌONfÄÔôä‘ÃcØ[8²ìWug:ò–éå-§,E6=(FÄ ìl*— +‚m˜Ÿ•ŽkX¦È¤÷ ‹ì'öeÍ®ún*c§”¨âyöÈÀ@4]B7íZ¾S”¥ ]zÞC*ñjUm«ìèvÅ(Š¼Ô=ß‘îQ£Ȧg̪ÕPBS€=å +¯ç%]„‚å{Ò¥Ítâ9×—Õj˜R6D%Ã-V¤)Æ—ÊU£XY†'ÍÆ3TðÝS>Ó7Ë®îÀ?a95ž¦.ÚCEEУ2ò|Ï“ò†ªá¶qQ`»¾iü»$± ÍštþY D²ézZÖ¤4!×m[Vù…u5ÑYdãú:dP™*äá©ò, R½ qêGŸ§Iš Í€§hÖ:ŒõåÏ Ž@‰8>UIOo°òÈe"C0:˜—a–4òiæù¯8‘?K)ÊáiTvoý,,‡\ì)Èh‚,ûhìÑÚ|䨪b‹¸%UŸ µ ö£ˆH¥f½iš‡þnÞYÃïeØêg({O=Ãÿüo£†¸ +endstream +endobj +42 0 obj +<< /Type /FontDescriptor +/FontName /28d5ce+mplus-1p-regular +/FontFile2 41 0 R +/FontBBox [-115 -343 1403 1075] +/Flags 4 +/StemV 0 +/ItalicAngle 0 +/Ascent 860 +/Descent -140 +/CapHeight 860 +/XHeight 0 +>> +endobj +43 0 obj +<< /Length 228 +/Filter [/FlateDecode] +>> +stream +xœ]±nà †wžâÆdˆ°éÐ!Uéâ¡MU·€ápj@~û$J¥wº_ÿ}èçøyx‚/À?r4#p>ØŒkܲA˜pöõ¬7å®Z7‹NŒ<îkÁe.‚”Œ’¹–¼ÃáÅÆ Œ_²Åìà ‡ïóHzÜRúÁCŽ)=ô¦Ó»^xÃNƒ%ß—ýDÌßÆמDÓý-Œ‰פ ffd²ë”tN1 öŸ%nÀäÌUg&-vu{%ÅsÿÔ¨»_ùúÃG.³åL‘ÚZ–šÂ|\*ÅT©Z¿noû +endstream +endobj +44 0 obj +[290 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000] +endobj +45 0 obj +<< /Length1 6272 +/Length 3928 +/Filter [/FlateDecode] +>> +stream +xœÍ8[l\×qsîsŸwŸÜ%¹|ÜååKÜÉ¥¸¼zP”(Q‘HŠ/=–‚byE.EÚ—"W´]4Ž'@$i>šq 5Š u +äܥ줭 M‘iŒ nÁ… 䣅­ÀŽÔF­ ";çì’¦l%@ûÕKܹ3sæ̙Ǚ9g Üðˆ0>6•ê¾4øÜI’EîÅ™¹åÚ*£éwðõϬu›ióÈ_Æñɹåë7ž——„ÿÆW¾¾øÌÜàûÊy‡]Ùù|nö}íÂ2oâÛ; µH~Žº‚H7Ïß(>]ó¶XBš"Ý·X˜É…W‚?Cý¿@ºáFîéeÒ'NàxÒúRîF~ìëÍS~´O~g¹°Z$O’¨Àñþå•üò½ýÊOpLBúKÀ|ƒ´rø¥ÐUÏ‘¡Q|‡q~rõ¸÷uì½ÂÀ†oùÁyâwø;±€Ã,TÁþGãº5ø:Øñe3dSc)!J>À/ë«P_fßo‘_@ÁuPDxä3>wjÕè°&ê;}hÃá7úÞ¨ð8È(®ó\d&¯ÂéBûÞ‡ ´vyã8v¿íäó`˜%¯Ã"÷ŠÇ\ˆ*^{ÿC>A ”úÝ(Éü«¨6»Àé­öÿ硧¦mãYzvmš‚q¬š*±ìÑiÎûÜ´þO”’Õ Jâú¿RW,A…øðdö”1MP1¾P­Óñl”L'¨gS£Fô÷²oE^ŸŽ \öAä½éˆ¥r,K‡Ö¦ùÀô4ê“ãî+—T‰[M䮮߹r%BÕ¨q«™³öX¶¸ß§›©µÇõϱEþÕèTl9cèTj=Ka<»ž_Ïé é‹D£Ó‘uNM–)¶ £l7â¢Fg\ƒ»ãŠë)ªÆ®duý´1”{BÏê³×Ê*˜œ›­ŒKëëúéõ¡œ±®¯|9ƒ)§(‰þ1È3çh|¥£[ÕÑhDßZÇ0à¤3hÍ…ŠmQ.æ‰úVeqCÏOE¢”Lg×Ñ¡3ƺ¡¯ŸY7rlBy +û$¨—¥Ávû˜ ñÂuö1rO<¾ß65G'Öo³°5ÖUªgD6§¸~Dv¾DÉW`˜ÚƳ!_¶†X´¨7Bp‘ç¦ëÑ«+Y ª~ŠŠ±“Ô¡Ÿ*©ÓBŒv$5†„}¸Lê¬È»ôS ÎÏġ5ä¸QÎuj—œÖivú{Mø®` o@?°ŽÒcøw zaÚá0tÂi€ËpFà +<g!'¡š!Q¨ƒP 0ixÂ|›¿/béü‡pGø‘xM|Sj–¾'åm%Ž£28À!ü +; ˆmÛŽÝÁ¿‚’C†u¥-¢Þ£²Tüغ õ¦(lÝ•œ K1ªtSÉ[Í£žî»b™éÛ¢½µ–äº-Q))Sצ¥j÷%êÙ´Ûm™Ú’U6-M»/SÇ&U¼²b ÄpІÊ6dɈɌmßc«H¨Ævx7œ ÙK;ËÒÈvï±QÚͤᇒ¬ØnÍ“¬<ä‡2kCN—{—Cj tve|Q_‹¢vQ$>‡ðo²$¸ý®!´n¿·ý À¯IÐ!üç÷ ’Àƒ ¼¾ý.Êüxû]¡µÆË;¿ á»ûvÌF©J€˜U¯Þ+Ù$DšÕ{„`‘³j÷h-Ƭɳ܈wlYMµ>¿e«2M°šë·i¢M=™~©»A¬ +ªšd4%…L0”îÎô“ƒ=F“ª—ƒGŠu´Æ«ê}æ`wM¤³»a4Ó<ÜÓÜóÙL¼§ÿ¥ñßêh³ÍV{à3©‘µskÇÛ\žÖ‰CÃ}ŸxûسæöÀlÞùPÆý¦A#| Jv´ÚÒ%óÁ>¨ +2”tIËNªs_¿IÃ&•M(ÉŽ0KÑ+2?*ÂûdÕû}~ +<(½éîp -ÅhjÝ I¦÷`O’MÊðצGŸþƘsbâ؉±ãúøUñ€ç@ÝðÍo®þE/9þ Ùs,?Úßó»¼›ŠZ]ó<ðŒì|HþÏÉj^-I…$[Ó—× ß·ªknÓjÌU æ*ÌŠ¯jÓòx±åzéE¦Æ˜îMjx7DCÄhØ» +W!‚œ(ã”îK‘h–p1xYt{C5ÑÝáíW±³zCáꚨ‘L~:OõX–ê4y¦Ò*«[|qk‹F —õ0© ÁÄ_=£ _ÌÛ'.lýcýòÄãí]çŽÊ†±)=䯇†'&f«:¶ß$M!ï‰?¹G¢!߉®t$ðÑ^LÉo0¦!ø^9¦%‡3„Q,¶½ãée¥NQm íºG«?ÌÐÇÁtºnS'ÆÍ…qsðN¶IC·Fi#È` áC±‚ Ñî +–ä`bGM0ôÉÀxYkP?˾ì‹Fáa÷gCÛoí÷Z€ÓØ·íسœxΟ,÷m˳[ÑU¬OWó>íBo]Ø¢üxæ(ˆ×lY~fÇ.²fTåagŽbîk?{½µ·ÜvNa4÷Òü¹kヱsOIÍÿÙÒ+OLÜüó¹ '̯MeîðÚ¾sTc4üµƒYpîž‚ÈK‡onÖrð>A5¬¤†­Ú²lhQã§s¡íåBÂþº!j,ðnë8¬g°„œ}I¨3i}%îºúò™_Þªej_aGŠ }—³PöºM ìs[ÕHûaÇ…+GÌ?ºz¦«gð@ûÐÚáTÛI¬ü Ïu¾vlí¯Ì_“>ç\Èœîðª¯ó½hàY²y‰AJ, +2F!ȼ°¬ÄSÔ³e5 ß ޒdžYiA<¡hðù_–ÁH´O °ä~îшoƒxÂ-ì –›ß1Ò p;“¤-‰W‚~5Á¤€VWH¸T5¤ŒÚªÐÉ澞Ž ¶hÊ×~Þ<5XS9¨LÄ&½µ©`ÇÄÁlÓIM õÔE">ÑÙÓRÕÙÓÔxÂír¥j"5šêL·×<ØT>GЯ„ÃÐFæ¡T+”7Ù¾«Ï–Š«ƒ£ô©ÜÒª>•_Y˜»¶¸póV~G† ++×ózw²S?¤?,¤W¤p «/Ñ›èîì4-q1¿²ºPXÒ;“Ô“ÏäÌÌ\ E™$LTç‹ÅåC©ÔÌÄ­¥…™Âl>¹Z¸µ2“ŸcF$—òÅÔùù…UÙ¥OæŠOåVò:2fòK«èç­¥ÙüŠ^œÏëSgFô±åüRYx¤,×wMéJv%\Ye.S3SX^@%ÅÂõ<ªXÑŸZ(Σ2a1Ò—s3Oæ0 KúØÐH²øt1îÈ-Ͳ™¹ÅÕ‚ž[Ë-,æ®-æËsúÐñ =W<¤WÜZYYX.®&W“èP +•üö¼á` Ö û­ˆpïÿ×ð€Ç‘?‹Ü<§.Bá8ä[@yæp¬7;2YÎaºŠ°ŠZGñ÷¹S\~•cl…”Cý‹ˆÝ„[ȪÌa߸Žë ‰¿òu8„ïïÒ¤BWyFôAzñíF-X_ÿùÈ*ÒÌ.ç'!S±gí,Â2®’¿ô=s–Pv¥gqfg·‚œì_ÚÆ^Ç>Ë^w×æã2âeKŠH]ç>ÍWìy +ÇU¶¬> +endobj +47 0 obj +<< /Length 1278 +/Filter [/FlateDecode] +>> +stream +xœe×ËnÛF†á½®BËtHs&Ã@‘n¼èu{stÔ’ + ß}ù½¤ikÀÆ/‰œy¾_Ã!}øôôÓÓùtß~»]ês¿ïÇéÜnýíòåVû¾ô—Óygì¾êýë+þÖ×|ݶ“Ÿßßîýõé<.û‡‡Ýá÷í÷ûí}ÿáÇv)ý‡Ýá×[ë·ÓùeÿáÏOÏÛëç/×ë_ýµŸïûãîñqßúØú9_ɯ}à´Omûütÿ¸óϼ_ûÞòÚLL½´þv͵ßòù¥ïŽÇLJ1wýÜþó‘9ç)eÔÏù6=n?[i(JKiU:J§ÒSz•2¨Œ”Qe¢L*ÊEåJ¹ªÌ”Ye¡,*+eUÙ(›ÊNÙUÊ-уÁkä5x¼¯‘×à5ò¼F^ƒ×Èkðy ^#¯Ákä5x¼¯‘×à5ò¼F^ƒ×Èkðy ^#¯Åkåµx­¼¯•×âµòZ¼V^‹×ÊkñZy-^+¯Åkåµx­¼¯•×âµòZ¼V^‹×ÊkñZy-^+¯Ãëäux¼¯“×áuò:¼N^‡×Éëð:y^'¯Ãëäux¼¯“×áuò:¼N^‡×Éëð:y^'¯Çëåõx½¼¯—×ãõòz¼^^×Ëëñzy=^/¯Çëåõx½¼¯—×ãõòz¼^^×Ëëñzy=^/oÀä xƒ¼o7à ò¼AÞ€7ÈðyÞ oÀä xƒ¼o7à ò¼AÞ€7ÈðyÞ oÄåx£¼o”7âÝþj·ùº«üo—‰$‰JI•$’$*I$IT’H’¨$‘$QI"I¢’D’D%‰$‰JI•$’$*I$IT’D’¤$‰$III’’$’$u>áMò&¼IÞ„7É›ð&yÞ$o›äMx“¼ o’7áMò&¼IÞ„7É›ð&y¼U†o•aÁ[5ñ‚·j¶oS orÁÛ4Û‚·sÞ®>,x»Â/x;³áíê·31ÞÎÄx;ãí +¿âíJ¼âíJ¼â특âíâ¬x»¯x»¯x‡+Þ!Êw¹âB®x‡b®x‡+ÞïrÅ;àà2äÍkçÎœñJ¼Y³e¼Y†Œ7+|œÆx‹ oSŠŒ7+[Æ›ețךyÞ¢oVûòæµ– -ã­Œ‹·é€"¯åþR¦W*–Ó4Xq”êC™^Jú[(£à®^ƒ1»¢y]•¡Èk¹}¼YM-x e#e®ÊÙ_y+ë¡hÜŠ7k¶:û«/ âåæZå5«dUýµl¼uz5n¥¿[ç¾ÛeBø÷&Si|y»(ô%q& %+S%ñ*Aª¾ÛÊBijPUÃ6\¥h,ž(¶+L¥ºÝ,¥‚4G©5Þ<¥ô…ÂcD› [˧©ñ¡*t› [ jj¼aijü¼IµÙxʪw¹g¶FÉ›wžØ¶¥§Rô>¶Þís¡hâ>½:­O¯VRÇËí¾³P*#àÙi|e0¼Uýí,ì*o§¿Uáûì/ãÎþ*[§¿Uê,”¦Nö¹°•m€lšx°:š&\}M³¹»)üÒðL2„´\%CHËF7æÕ§‰ÇÊ»BŽLÉ…’)æjæ€Æ»J1:%¦H߯:=Së©ÿÛ³zýr»méükÀó¹žÌOçþí¿‡ë媳ôû7±Ùð +endstream +endobj +48 0 obj +[333 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 500 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 517 444 305 760 760 760 760 760 239 794 517 500 517 760 760 383 361 760 461 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760] +endobj +49 0 obj +<< /Length1 13696 +/Length 8659 +/Filter [/FlateDecode] +>> +stream +xœ½{ip×yà{}LÏ=Ýs˜™îi\Äœ˜0€€À )J‘$ê€HðH R’-J¶ÑfÙŽåuy“ÔFq´‘Röf£×HVÖ»[ˆíJy7»kÙµE»¢¬íMi¥§be + Üï{ÝAYΦòcÁê×ï}ïõëï>^ %„ȳD$î9\­ùÉÓ¿AèÑçg^øîp…á ÀÂ'..ëâ£ÂÛ„¸t˜8µpúü§¤ò1BÄÂùô¹'O=xæW éÝ{fnöäOo>‘ƒµ +×и"ôì…qç™óËO +±™‹Ã=±±A2•ÎtdI.Ì4 +f'éêîé%ÛúŠÿïwÿÿùc¤ÄHd?ë;0Íî¼8ÈyG’¹ŠÓc3öôŒþßT’eFKúŸ0±Ì„ÒþCÓ“æŒQfbélRg¦ 61SfR 5Lã©éÿ‘ù¯3X7ý^æ§3Ó`rqšM]œá33°Ÿ\ +¿¯Ì\%«@? o×?yüx†ØF)Y4± r—šު–™§¤?/ù&l£3±kŸ©3©ûNFL_»:«cg8c3™«|tÈá ½6vjF5`G_Iÿ.'Ç_Ò«L)ŸÖõ=æÔì£ú´~ò{ \À7ëõ«úž«S³æUýªÉ_gâælV}`s8€g‚üMc×’†‘ѯ]6ÀCû›#n_*™ú5çå¦>½ÿpÆ`tfú*´Ï¼jêW÷]5gñû¼•™ŠbÞ€ðû¸Š7söч·R‚FJ@ÄÕ+ȶ;OšW¦˜ÞžY›)ƒ¼Aoþ +£Ÿ!û™ûÀ´Eégg¬)äSA¢‡ óìL¨:>M˜¬O2±¸›)úä¿#2ý?D(ò±KŸl+Y{@™ÛYå¨{D(ZâÔEzñ!Ÿð7dâÞÔæÐ ÜÏÜÜ\0s³pó"Øâ«Â îdÈ´ý¤÷;H“ÁŽ’).’$9ÒA¦É6Xñ"üû>MÒãô÷è_ Aa»Ðþ^¼K|Gz^ú”ÿÖuÙõM%©\vr·=Ÿ÷Ö¼_òÅ}Ÿõý™ÿ¨ÿþŸú|8øJH}V=©~W»7ÿud5ò£h'ì./xA¼¶H\Ä%DþÌ©ºªÈä^©ÈÜuæ­2c•ò1ej•‘k–ä¾Î\u&©VY¨n‰îë–Ý¡´%ù¯HLªÈÌ¿f)Ê ‰…Ö,—ûŠÌ܉¹Ö¬ rCfÞ5æRWd—;R„I7ì´"KþHQF°g¬À@ !Ø«®ø¼A{ƒ|µÏ^ àÀ&Vp5ùª$»<¾@0TqþèWeÛñ6 l"MI­¿©ZWDŒ>š¡bDóÒë»^zɤ£/þæo¾øì¿ö +ó^àÒÈ{¯R]Xÿ9õ­cý/…p«?¤{hY8þs˜´Á»W=2QKriUv“©èÜ€‰–\ +Ó¢¥-Ìļ?20TÇ¢JÔ,t÷t÷ tÿ0Nû3Q$ +TØDãôû…Þ£»Ê¢› +·nþ[º*¼Nò #”é\a÷õv˜x‹+$œð-ƒùÎãb÷ +!¤±h<—Y¨ˆVqlwÃë ‰Šàr:~b*òy»ÇŠ´¾çÇvåƒ^Ù­’ÔlÏ-|¥?ô†ò»¦Ÿ#üýÓðþûáýè)óU­¨|½í‹â»}yP^¨2÷5+ „šÄÀ¸Ð§ƒÝ=ðæÁ¡F='Fƒ"Ð;ívûééQ¯è‚~ŽBqQ Ž¸üŠB‡_=9·:¤ˆŠ"»UÄásÏMï̇.rþXCº°x•…–(_g¾z;.6ñˆ§ØãØD,YeÑkLª[šï:óÖÛ"=¡aYTÃn”ïR - +ÒQéŒêÀF%Ö¨G3-4ÍBW(üJÚ§Ñß1ò¯ ÑßÕi¿W{ÙPµ—èˆñ2â÷ÆÍ‹ô 9Gâdi+èF@&Pƒh•²D•y¯­ªn¢JE+ ¯T½Z¸-ÊÑV«E,%ʵƒ4KV[-‡‡ ¹nÎ>)…ÁéBÁå÷Æ󹤱m°©ùÕ…€êÛfJ¹d\ìéߟScä?Ð×È;Y%sW-A¾ŽežªååhõXÔ4Ž4«ëë?ú™4—õ›Ðl<à±Ú¤Aä4líSôôÚªË#çfùœmt]…7󉸞¯}F>3pïQУï€eÈ¿=îàz¬W ƒd¸*£d¬¬íR\ŠPaÊš‘o0u êŠ((‘âŠÌ[¶,¢®h†alÛ°C…é +já“[ÌÕ"A”UÍ¥„#•Mçð:Â\Jäv×hÒZs\â$(4”é£zD—ì–|!÷ú¿‘I–É«¶¼aá Y’I úåÈ{{¨[T$YA‚:tóïè÷„Ljd°Û椥‚Ý Ë°â(Œ$§ßôûU+BÑù]çºñkaKô ZÄUèº*Ddƒ±€Q3š£úXÁ5ôÄÌ‘à5fL•Çï=öÄ1Z¿üµ×~®_xêÈo_xôsŸÞŸ¼ø$A;Ú!Ä&¦ã©·]èÄDÈ(ÚJ’B£±<à_<Ü\<~4ÊÍ…yTKs’ªV$n ÐÜG°1Ôc\QÅ( ŒR Ð=oìéõ†Ó³…Üú»†ñÈÀ N¯u(ÇãÆ{‘IÑLš¾Ñoû™h¾8ö“?"íâhÇÌbaöYd^`Á,¢L ‘׫,}‘:ªR[Ks=Šžš +ÉX¼ƒÕ<ÖëVÃV,³p…™VX³„à N T.Љ·2¶ E®-Z$¡"‘ׄ‚!fS} +æ­!W«SNdAjA::°¤9ÐW@JfÁ¾Ø–݆=‡¨ ªg)š¶QP;çj»r]A)_B ¦æ>ü̳H¥<ÍJ¡àúoÒPþpïh¾¨wÅ&ö쌉q·Äwž9sxá±/…ô®œ7àux ú÷}È;êä·^€oä% ˆÀKyÙ#__í‹Ôt}LGúÕ‘$²µlµz§½œ§½žâŠ§7í.®‚cÙ^ 2ïµ=Üz=.àz¯j© ÃeÐå¸{ÒÀŠdËR{µðJ ÞWëLƒ"€–[¬Oc½œMŽjÇ6õÚ,³0¨OsE»¥ÿ© ÷dÆ{&¸s{ïH2åÍ‚WTh:,ü}kòȹ=xhéàŒê ‹^=wldøX$1iÈ’Í“ùãü¥»-Ï€rŸM|vŽÜs›×n§Eèç—¯²ä5ˆ!«¶Ós¹4v$6QD9mLöóáa ÿàÑ—§r•|¾ò þ O×û ”«1ð¿ÐÓ_¿#iŽØ’ÀŽ£U4 +AÑJp7 š7ä¸x°Ã^WÈ¡— +ùõg “>Õ‰÷«n1“¶2ÜýÃþPžŸ ¤•ö$i¬ƒÇÙv¢ÐU¯×Û^tå¢7Pd•†Õ ¯7ëh†&*W˜yŸ²&[]nHkkV­É`¡buÕÜ0”­˜ž¶&1Q]‘DîÓyëÂVÓ] k GxåmŒ·qlYM]1kèt©+]5€—±ExáNô#¨ Û¿?"´amxÇ–©H‹E[àÉXB†BC´Ö/ÉZ8¯lù£QçPw•+´äýé&Xù8mFÆ)X{NŽEr4²H¹‡ì1gÔl=J¦ütQÖ¢·ßí‚XóaÁ#¤ }Q7’Ù/ç ·¤ÿίîAbþ¢Dߊj‚"ùä°¾ÍÓéIgÒ™|$“þ_ÔÌ=‹~ üÀÛàSKä ¶4­4ˆ2]DCOwa¾âdž†pŒ§Q± ÚC¼•×™´M#Uo‚&·Ã`ÿr#—±@©ZX!£!ÄÄt ©ÚŠÖ‘ÍaW ³|‹ù5ûˆGX!Tî…HT°Œ9úAÈSiOlăhŽËÀ2nò½Ÿ>5µÃãzÓyͯÈÁµ= {H” +’Ëãò—ϽJ¥7“j óó†[ {ä ö/v H’ävÉbvÏ£Ü/†æÀ/&ÈÒŽ#? è1µÞö ?”¡¬óØ,^³Tp~*g‰ +™%ó×ۢʓK ƒJƒµ#_ÍÜ°²"åáŒí°±ü¸®Ón]_~<¯¯¿e@¤»‘±­+é ÇÅ„æ:à’|¶Þ‚‹À=®Qìv´:Þ†æÛpbªŠ5™•ÿ›½=ÎJÅÔ˜Ü"·‚i”*yý+FnçzP|JOgè Р]W>¤çgMÛ'äÏÿ xv€÷9MÚSKŠ9îÀr‚õÕh6^ëʺ«LºfÅ@i²õvL⊥-Å°+!ºPhÆT,]¬  ÛƒÙ„+ ‘!ÈcB£ÖØÀ›W4C‰Ø1¢HszOžû¾S±Ò§: ú§ŒOœßç=;mBæwwŒw~ õƒåþ»oÅÅÇÈ$¹NÚ#èÝv7,/è¾wñòbhL¡IäV À¥‚KhSUç™F¨’묪"Ò,S· +è×4« „ì±Ý`D½Â"H\­ºûX{ª +ý-Þ' †©haÇÏÜ6âùE5ŽyáîV‹õh+ÞÔÈv´­@˜u¶XA #–w¸·½ÅJšoºT˜e—(qð:Áµ‡›J ž²Ù¶'ö8‰J"K¹Ùõ¡È…B”JÇöŸÓÊ“góz\ÞLÞëw‰¢Ð±8´¯–‘Ÿß·ÿÔÑÓ»)8-Éå–3™Ïé!·otÇÞ’9î—r.õî掇zÙ|!/ù½QåìX_WÍ+äµGš»mHªË/»;pDß+9ÇNÁo±PÝN ³ÐÔÛYnY³R0“gÔ9ENµ2 V!°”Pׄ°4Ì„°›-d!ÕòcQâê´Kgn¿ÀЯžA°;é°“X“wÓ:ý5=KsôscZP]Ï.O;è_.åsë +$hãP¥“£'ÈÝFÚ-Ô³>;ŠvÕEý¨c©†¥(͈²ÑªµÝ ¡Q ¡,_i5´,¯ZQ톥Ԯ0¥Âº ,R­0¨U—jÕ0Äò8ÅØ©¨+.¥†!`„Ánn†A¸ Á uEoäq)À +[1±Å'ûí'aI—@ðŒÞÒÑ6ÀPUõVÛ¢»f« ¯Ä^‹|UrEô‚Ù_¿1UQÂâ+šxW­¿ÞøG"ed¨J7½U‘&š[Š+Oã;S…Ìó²üTN׌|hæ°ª ºž]”åg½r‡P=tžÖ 2’½¾b¨|ùehÒ?ôy¯‡½8þîw7ò Ñ›ïB~TÈùuÒ.¡»V]ÆR¢‡ÔÐò€è°çaÍê€bÞ¬9–æÚ­²µ¼È  È9j{NàNw]5ò¨:4Fƒt4=öôt´sO9I¡ÆŒïŸÌoërer`PÔÁá™ÉÆÇÊÅC.úØsO¥*•¬PM]’G‹'µÞ\ÈØYþÏ· 4>ïÝÓÔ·åtÙ{\hpÞ…n¾ 9Ún•´a÷ôM-žÇïµu×®œ¼roH·*ç”7!‚ŧÐâóU¬!¸dXJƒB-˜¶¶Š™Á¶tPO Äxô£Ó3gÂõÿä?ÑêÞ€’ðáCì ‰àÅÚÑgV?¾8õíÙ_èH$ƒÿð©Ýv|,BóšpšlC©Ë.I¦ÞV$ç¤IUcž" +€1‹6P@,¦¯Êº¯1±neÐCÕÛ™nlsð@7šÝh„~Þj9jYDµÌÜ:+µÓ›¤%K·²` ·ÛN?‹…ùû'Œ©¦àV ß(èÏŸ×ÕÐúóÆùëô ™¡;v ©ñnjnkB‚üµ´O¥PRüVÉõ¯A>CÚuÔ¾l“Ïv²PÆ8º‘·£¢}žAÙÏ’Ëx°Vo—yŠP®¡ÜÊ[岊§ÕXàØ–;‘ÌÀ3É›tPÌA˜5Ë@jÔ²G€NØñ¼þÖíiµwççäó ç`q…ŒfN?¬SÈF÷ êÆ+z®ÙQŽô3…×™tA ¿'iÒÈÁ}ocDº^«óc}¯ÑûÍ›ø-˜~]xD튎Ðãa)s}ƒæ¿Ÿ¢xöÙ¨%·DYŒó%8S*2%°µ¢‰ÿbÉ°QÎŒŸúx¯þ»ú§sÆ:Óm\õ÷Þ±ã%ÿÜ,è +äÕHÚ»ñýùþ—Ôjº³44Þ™¨Û>ßðLrõò¬gŠ£5îåÙÝ8Çl|b6¾‰ +ㆇ2”ìÝúcЭä]ç|`¢a¹A n~>à–ç’™ið#ó>~œÑg";QëX¸nmçŸÊ­²ï:Û®r2 ZñåzÝEkÈwÝÚeWy¡Àª°ÀšU“o°ÆT¥+ÁPÀ>Çì¯5 2«cÛ†þ–r«ÞjÃ*è‘×`¨¶YjÑ÷ù©A¿;K-«w;¨Œ;Ö‡*ãÓÚ‰ŒÉ?<º#ƒ>Í +ÁT,Ìö‘÷x\g*B]WlÔa^Yíl3'Ú§X8»Ä^_,áîxtÏÎáɇcÇžÙ]’d¿;M¥ ÏEß»'3õÚ„ÖϱSôÞŸÇåIg/DÂé¹R®£Ë¯*Ù©òTÊãv뇼JÈåõÄi«3¥¥‚ämôßS*¸$”U +dõu¡r ÐÃNÔÃ:ÃNîIJyÏÎ{ðX­g([² 00ü:æYP¿w‡7óŸ°¬`„˜ü¤Ãž¸l&?rÄÉy¢9ÁÎCíï:.“— )ãDV̶óù¶ìêЛY!ÿÀöOÏšöÜg÷¥3ëïú#É`Þ tÂ#£OSü­ÖPïÑWI?¹LÚ½¨‡FQ´+NV²ZƒÙÿV¹f¹ÁW܈w…ÇLvqÀ.69AàÎÛá^“€Uâ=õ⇋^p/ùnКÆÜ-&`’·Q]јC–·¥< ä¿1@bCÆ9Ø•«ß!Ñá¬~HÊë‡CA¡_ÏpGHÊÝ~¿wý/ÔpÊ?hx"ßg³þåL:è÷÷ERÝI•J¶ŸÙàÁ(ùŽý ‹6ñ©†ÕB›ŒØ'Û«Ö˜mR­Á+¬Uaƒk–Û{ƒyÖ¬Þî+¬·Âº×,á5+Ác9K »a$[yõ†Äô5™µÔ•¡Ö ˜\“·Ãض¡ÝbxCP·È„×£†õîÞæðàÐÖïA7À#y>ÓÚ:µyÂaµjÀâ`ßfъ籑qÚ䦖ˆ˜ÝvE¦Š.^·ã¿ UpP‘ÅdðÇs¹ñQ©*z“º»ÇÓϩ곆ÞéÊÑL$¦ÈY,ËÞŒËP\‚8¬„1ÁÆ3ÝoD]Žè넪)åS!Ï~IÏíM‡ŒTò‹ž°,\¥ÅN›÷x~õð~'“öòÒ¶QùúŠ WZ˜Ìa¾Íô†•Ââ(ÓÅå°«jí¶å g®0¹"±¾ŠÌdÕ Ý`}ª• Ý°|-.!ßär7XjM‚ù—œž+ØʲW<¾V¤KÝ8é±Á)u%*8Uâàô&¸¸ …ÁhŸ³ÉÈæ&ApÞ³Ež +þ6@ñøÒ}ÅÒˆk«(˜JgÜ•] Jç€í¡ 2â©ÎÐY@b4ßñS«ÍC+ÝtÇú»3žó^-£k¡§ zô¾£Q£ð´žÏP!äY|ÆÔó¹g #zꑸ¡_†ä(qGÁ÷¼/\ZÂLÑ÷‹>¼ò“Îo8’7³ôç »òw¤&v]Ôîázô ¿¨ñÓ¬¶ÛŸà²¬²ìµUÓMz!òÙR¬÷_aõ +ë·­É»fýW˜ÄZÄOx^°œ$XŽDäö«Ý^7+ª+Þ¢?R”VJx[)ó¶ÂÛ*¶mX»åLVm‘H¹Å*-VE¹x¼Éj±TÞjb·ÇëO¼~ÛÄf¬3ñëA׆sæLŽ$†"[Ä!ný«@x—›C=¢YH—uÝ<sÏuŒgòfáKõ¸ªÅdÅHFC!wHäÏw`L©õw[:Íò¯9c¿Q4´X"¤øEo"ÚÛ¦øBnAÞ"í§.ÈS+ø;üõLR*:7Ê|üc.7Ú7<”&Äy>úV*¬¥ÓZ$ùV8•êL¢YÚ5 È7ŽÙol#ÿ‚êÉ'nÔz žnñóožk…øÁEh#»„ŒËëá?CºUîáIÿÖ‚ïñ±S— ë_4æ ùõoãÇ¡I:¤¯¿b×|øÃáŸlþ¶»¸QB‘݈LƾHÉ.§/yÂéË$H^pú.’&¿µùKçù§ ßnÚý IÓ¯ÀnTÂ߀¥ßtú”t i§/¯0áôEò¨p§Ó—H]Xuú2é~ìôýd·(8ý}QœrúA2 ½½kþÂŹÅ幓ú#Oê;.œ\œ{R?ZÑÌ^˜¿¨ŸZœ?¯ß;7­Ÿš¿°¼´ëî#ú½O.Ì]Z<»<·ð'–ï:{úÌòLNÍ/žžÓë•š>¢À:/„¹þáòP¹^«µ~颣s‹Kgç/èµJvžj4EX}kq™¯;³¼¼0R­ž8_~üÂÙó'ç*Kó/ž˜;…˜T.Ì-Wï=svIGäôÃó§–/Í.Îé8wöÄÜ…% ÷ñ 'áÅËgæôÃûîÒïY˜»`/¾Ë^PÒ70é¯ôW¼|3çYÜæÄüÂYØdyþôl±¨_:»|6ƒ䕾0{â±YàÈÙ ú=SwU–ŸX.yg/œÄ'gÏ-Íë³gÏž›}äÜœýà¬>µã >»<¢;d-X<»°¼TY:{®Ua“_>š7O.‹dŽ,’ehO·cÄQ‡]*¤éà†ó˰ϩ¿À2yV…þr?9N¾A¾GÈÿÁj— +endstream +endobj +50 0 obj +<< /Type /FontDescriptor +/FontName /8a6373+CMUTypewriter-Light +/FontFile2 49 0 R +/FontBBox [-203 -390 729 1045] +/Flags 4 +/StemV 0 +/ItalicAngle 0 +/Ascent 775 +/Descent -225 +/CapHeight 775 +/XHeight 0 +>> +endobj +51 0 obj +<< /Length 1278 +/Filter [/FlateDecode] +>> +stream +xœe×ËnÛF†á½®BËtHs&Ã@‘n¼èu{stÔ’ + ß}ù½¤ikÀÆ/‰œy¾_Ã!}øôôÓÓùtß~»]ês¿ïÇéÜnýíòåVû¾ô—Óygì¾êýë+þÖ×|ݶ“Ÿßßîýõé<.û‡‡Ýá÷í÷ûí}ÿáÇv)ý‡Ýá×[ë·ÓùeÿáÏOÏÛëç/×ë_ýµŸïûãîñqßúØú9_ɯ}à´Omûütÿ¸óϼ_ûÞòÚLL½´þv͵ßòù¥ïŽÇLJ1wýÜþó‘9ç)eÔÏù6=n?[i(JKiU:J§ÒSz•2¨Œ”Qe¢L*ÊEåJ¹ªÌ”Ye¡,*+eUÙ(›ÊNÙUÊ-уÁkä5x¼¯‘×à5ò¼F^ƒ×Èkðy ^#¯Ákä5x¼¯‘×à5ò¼F^ƒ×Èkðy ^#¯Åkåµx­¼¯•×âµòZ¼V^‹×ÊkñZy-^+¯Åkåµx­¼¯•×âµòZ¼V^‹×ÊkñZy-^+¯Ãëäux¼¯“×áuò:¼N^‡×Éëð:y^'¯Ãëäux¼¯“×áuò:¼N^‡×Éëð:y^'¯Çëåõx½¼¯—×ãõòz¼^^×Ëëñzy=^/¯Çëåõx½¼¯—×ãõòz¼^^×Ëëñzy=^/oÀä xƒ¼o7à ò¼AÞ€7ÈðyÞ oÀä xƒ¼o7à ò¼AÞ€7ÈðyÞ oÄåx£¼o”7âÝþj·ùº«üo—‰$‰JI•$’$*I$IT’H’¨$‘$QI"I¢’D’D%‰$‰JI•$’$*I$IT’D’¤$‰$III’’$’$u>áMò&¼IÞ„7É›ð&yÞ$o›äMx“¼ o’7áMò&¼IÞ„7É›ð&y¼U†o•aÁ[5ñ‚·j¶oS orÁÛ4Û‚·sÞ®>,x»Â/x;³áíê·31ÞÎÄx;ãí +¿âíJ¼âíJ¼â특âíâ¬x»¯x»¯x‡+Þ!Êw¹âB®x‡b®x‡+ÞïrÅ;àà2äÍkçÎœñJ¼Y³e¼Y†Œ7+|œÆx‹ oSŠŒ7+[Æ›ețךyÞ¢oVûòæµ– -ã­Œ‹·é€"¯åþR¦W*–Ó4Xq”êC™^Jú[(£à®^ƒ1»¢y]•¡Èk¹}¼YM-x e#e®ÊÙ_y+ë¡hÜŠ7k¶:û«/ âåæZå5«dUýµl¼uz5n¥¿[ç¾ÛeBø÷&Si|y»(ô%q& %+S%ñ*Aª¾ÛÊBijPUÃ6\¥h,ž(¶+L¥ºÝ,¥‚4G©5Þ<¥ô…ÂcD› [˧©ñ¡*t› [ jj¼aijü¼IµÙxʪw¹g¶FÉ›wžØ¶¥§Rô>¶Þís¡hâ>½:­O¯VRÇËí¾³P*#àÙi|e0¼Uýí,ì*o§¿Uáûì/ãÎþ*[§¿Uê,”¦Nö¹°•m€lšx°:š&\}M³¹»)üÒðL2„´\%CHËF7æÕ§‰ÇÊ»BŽLÉ…’)æjæ€Æ»J1:%¦H߯:=Së©ÿÛ³zýr»méükÀó¹žÌOçþí¿‡ë媳ôû7±Ùð +endstream +endobj +52 0 obj +[525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525] +endobj +xref +0 53 +0000000000 65535 f +0000000015 00000 n +0000000240 00000 n +0000000441 00000 n +0000000505 00000 n +0000000556 00000 n +0000000828 00000 n +0000006166 00000 n +0000006558 00000 n +0000006600 00000 n +0000006648 00000 n +0000006740 00000 n +0000006783 00000 n +0000006951 00000 n +0000007120 00000 n +0000007293 00000 n +0000007465 00000 n +0000007596 00000 n +0000021474 00000 n +0000021887 00000 n +0000021931 00000 n +0000022107 00000 n +0000022205 00000 n +0000022380 00000 n +0000022498 00000 n +0000022572 00000 n +0000022710 00000 n +0000022865 00000 n +0000023304 00000 n +0000023364 00000 n +0000023638 00000 n +0000023912 00000 n +0000024182 00000 n +0000024452 00000 n +0000032294 00000 n +0000032505 00000 n +0000033859 00000 n +0000034774 00000 n +0000043523 00000 n +0000043739 00000 n +0000045093 00000 n +0000046007 00000 n +0000047225 00000 n +0000047439 00000 n +0000047742 00000 n +0000048879 00000 n +0000052897 00000 n +0000053119 00000 n +0000054473 00000 n +0000055387 00000 n +0000064137 00000 n +0000064353 00000 n +0000065707 00000 n +trailer +<< /Size 53 +/Root 2 0 R +/Info 1 0 R +>> +startxref +66621 +%%EOF diff --git a/src/zc/Zcmd_footer.adoc b/src/zc/Zcmd_footer.adoc new file mode 100644 index 0000000..8fdcd87 --- /dev/null +++ b/src/zc/Zcmd_footer.adoc @@ -0,0 +1,12 @@ + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zcmd (<>) +|0.1 +|Development +|=== diff --git a/src/zc/Zcmp_footer.adoc b/src/zc/Zcmp_footer.adoc new file mode 100644 index 0000000..b0d3d4a --- /dev/null +++ b/src/zc/Zcmp_footer.adoc @@ -0,0 +1,12 @@ + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zcmp (<>) +|{version-label} +|{lifecycle-state} +|=== diff --git a/src/zc/Zcmpe_footer.adoc b/src/zc/Zcmpe_footer.adoc new file mode 100644 index 0000000..1e7ba38 --- /dev/null +++ b/src/zc/Zcmpe_footer.adoc @@ -0,0 +1,12 @@ + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zcmpe (<>) +|{version-label} +|Stable +|=== diff --git a/src/zc/Zcmt_footer.adoc b/src/zc/Zcmt_footer.adoc new file mode 100644 index 0000000..5206794 --- /dev/null +++ b/src/zc/Zcmt_footer.adoc @@ -0,0 +1,12 @@ + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zcmt (<>) +|{version-label} +|{lifecycle-state} +|=== diff --git a/src/zc/c_lbsb_imm_offset.adoc b/src/zc/c_lbsb_imm_offset.adoc new file mode 100644 index 0000000..dd7bf83 --- /dev/null +++ b/src/zc/c_lbsb_imm_offset.adoc @@ -0,0 +1,8 @@ + +The immediate offset is formed as follows: +[source,sail] +-- + uimm[31:2] = 0; + uimm[1] = encoding[5]; + uimm[0] = encoding[6]; +-- diff --git a/src/zc/c_lbu.adoc b/src/zc/c_lbu.adoc new file mode 100644 index 0000000..3928373 --- /dev/null +++ b/src/zc/c_lbu.adoc @@ -0,0 +1,46 @@ +<<< +[#insns-c_lbu,reftext="Load unsigned byte, 16-bit encoding"] +=== c.lbu + +Synopsis:: +Load unsigned byte, 16-bit encoding + +Mnemonic:: +c.lbu _rd'_, _uimm_(_rs1'_) + +Encoding (RV32, RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x0, attr: ['C0'] }, + { bits: 3, name: 'rd\'' }, + { bits: 2, name: 'uimm[0|1]' }, + { bits: 3, name: 'rs1\'' }, + { bits: 3, name: 0x0 }, + { bits: 3, name: 0x4, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +include::c_lbsb_imm_offset.adoc[] + +Description:: +This instruction loads a byte from the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. The resulting byte is zero extended to XLEN bits and is written to _rd'_. + +[NOTE] + _rd'_ and _rs1'_ are from the standard 8-register set x8-x15. + +Prerequisites:: +None + +32-bit equivalent:: +<> + +Operation:: +[source,sail] +-- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +X(rdc) = EXTZ(mem[X(rs1c)+EXTZ(uimm)][7..0]); +-- + +include::Zcb_footer.adoc[] diff --git a/src/zc/c_lh.adoc b/src/zc/c_lh.adoc new file mode 100644 index 0000000..e519754 --- /dev/null +++ b/src/zc/c_lh.adoc @@ -0,0 +1,48 @@ +<<< +[#insns-c_lh,reftext="Load signed halfword, 16-bit encoding"] +=== c.lh + +Synopsis:: +Load signed halfword, 16-bit encoding + +Mnemonic:: +c.lh _rd'_, _uimm_(_rs1'_) + +Encoding (RV32, RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x0, attr: ['C0'] }, + { bits: 3, name: 'rd\'' }, + { bits: 1, name: 'uimm[1]' }, + { bits: 1, name: 0x1 }, + { bits: 3, name: 'rs1\'' }, + { bits: 3, name: 0x1 }, + { bits: 3, name: 0x4, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +include::c_lhsh_imm_offset.adoc[] + +Description:: +This instruction loads a halfword from the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. The resulting halfword is sign extended to XLEN bits and is written to _rd'_. + +[NOTE] + _rd'_ and _rs1'_ are from the standard 8-register set x8-x15. + +Prerequisites:: +None + +32-bit equivalent:: +<> + +Operation:: +[source,sail] +-- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +X(rdc) = EXTS(load_mem[X(rs1c)+EXTZ(uimm)][15..0]); +-- + +include::Zcb_footer.adoc[] + diff --git a/src/zc/c_lhsh_imm_offset.adoc b/src/zc/c_lhsh_imm_offset.adoc new file mode 100644 index 0000000..20f1b2b --- /dev/null +++ b/src/zc/c_lhsh_imm_offset.adoc @@ -0,0 +1,8 @@ + +The immediate offset is formed as follows: +[source,sail] +-- + uimm[31:2] = 0; + uimm[1] = encoding[5]; + uimm[0] = 0; +-- diff --git a/src/zc/c_lhu.adoc b/src/zc/c_lhu.adoc new file mode 100644 index 0000000..6db5211 --- /dev/null +++ b/src/zc/c_lhu.adoc @@ -0,0 +1,48 @@ +<<< +[#insns-c_lhu,reftext="Load unsigned halfword, 16-bit encoding"] +=== c.lhu + +Synopsis:: +Load unsigned halfword, 16-bit encoding + +Mnemonic:: +c.lhu _rd'_, _uimm_(_rs1'_) + +Encoding (RV32, RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x0, attr: ['C0'] }, + { bits: 3, name: 'rd\'' }, + { bits: 1, name: 'uimm[1]' }, + { bits: 1, name: 0x0 }, + { bits: 3, name: 'rs1\'' }, + { bits: 3, name: 0x1 }, + { bits: 3, name: 0x4, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +include::c_lhsh_imm_offset.adoc[] + +Description:: +This instruction loads a halfword from the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. The resulting halfword is zero extended to XLEN bits and is written to _rd'_. + +[NOTE] + _rd'_ and _rs1'_ are from the standard 8-register set x8-x15. + +Prerequisites:: +None + +32-bit equivalent:: +<> + +Operation:: +[source,sail] +-- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +X(rdc) = EXTZ(load_mem[X(rs1c)+EXTZ(uimm)][15..0]); +-- + +include::Zcb_footer.adoc[] + diff --git a/src/zc/c_mul.adoc b/src/zc/c_mul.adoc new file mode 100644 index 0000000..d2f5a21 --- /dev/null +++ b/src/zc/c_mul.adoc @@ -0,0 +1,48 @@ +<<< +[#insns-c_mul,reftext="Multiply, 16-bit encoding"] +=== c.mul + +Synopsis:: +Multiply, 16-bit encoding + +Mnemonic:: +c.mul _rsd'_, _rs2'_ + +Encoding (RV32, RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x1, attr: ['C1'] }, + { bits: 3, name: 'rs2\'', attr: ['SRC2'] }, + { bits: 2, name: 0x2, attr: ['FUNCT2'] }, + { bits: 3, name: 'rd\'/rs1\'', attr: ['SRCDST'] }, + { bits: 3, name: 0x7 }, + { bits: 3, name: 0x4, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +Description:: +This instruction multiplies XLEN bits of the source operands from _rsd'_ and _rs2'_ and writes the lowest XLEN bits of the result to _rsd'_. + +[NOTE] + _rd'/rs1'_ and _rs2'_ are from the standard 8-register set x8-x15. + +Prerequisites:: +M or Zmmul must be configured. + +32-bit equivalent:: +<> + +[NOTE] + + The SAIL module variable for _rd'/rs1'_ is called _rsdc_, and for _rs2'_ is called _rs2c_. + +Operation:: +[source,sail] +-- +let result_wide = to_bits(2 * sizeof(xlen), signed(X(rsdc)) * signed(X(rs2c))); +X(rsdc) = result_wide[(sizeof(xlen) - 1) .. 0]; +-- + +include::Zcb_footer.adoc[] + diff --git a/src/zc/c_not.adoc b/src/zc/c_not.adoc new file mode 100644 index 0000000..4207ba0 --- /dev/null +++ b/src/zc/c_not.adoc @@ -0,0 +1,50 @@ +<<< +[#insns-c_not,reftext="Bitwise not, 16-bit encoding"] +=== c.not + +Synopsis:: +Bitwise not, 16-bit encoding + +Mnemonic:: +c.not _rd'/rs1'_ + +Encoding (RV32, RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x1, attr: ['C1'] }, + { bits: 3, name: 0x5, attr: ['C.NOT'] }, + { bits: 2, name: 0x3, attr: ['FUNCT2'] }, + { bits: 3, name: 'rd\'/rs1\'', attr: ['SRCDST'] }, + { bits: 3, name: 0x7 }, + { bits: 3, name: 0x4, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +Description:: +This instruction takes the one's complement of _rd'/rs1'_ and writes the result to the same register. + +[NOTE] + _rd'/rs1'_ is from the standard 8-register set x8-x15. + +Prerequisites:: +None + +32-bit equivalent:: +[source,sail] +-- +xori rd'/rs1', rd'/rs1', -1 +-- + +[NOTE] + + The SAIL module variable for _rd'/rs1'_ is called _rsdc_. + +Operation:: +[source,sail] +-- +X(rsdc) = X(rsdc) XOR -1; +-- + +include::Zcb_footer.adoc[] + diff --git a/src/zc/c_sb.adoc b/src/zc/c_sb.adoc new file mode 100644 index 0000000..d0b1ac6 --- /dev/null +++ b/src/zc/c_sb.adoc @@ -0,0 +1,46 @@ +<<< +[#insns-c_sb,reftext="Store byte, 16-bit encoding"] +=== c.sb + +Synopsis:: +Store byte, 16-bit encoding + +Mnemonic:: +c.sb _rs2'_, _uimm_(_rs1'_) + +Encoding (RV32, RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x0, attr: ['C0'] }, + { bits: 3, name: 'rs2\'' }, + { bits: 2, name: 'uimm[0|1]' }, + { bits: 3, name: 'rs1\'' }, + { bits: 3, name: 0x2 }, + { bits: 3, name: 0x4, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +include::c_lbsb_imm_offset.adoc[] + +Description:: +This instruction stores the least significant byte of _rs2'_ to the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. + +[NOTE] + _rs1'_ and _rs2'_ are from the standard 8-register set x8-x15. + +Prerequisites:: +None + +32-bit equivalent:: +<> + +Operation:: +[source,sail] +-- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +mem[X(rs1c)+EXTZ(uimm)][7..0] = X(rs2c) +-- + +include::Zcb_footer.adoc[] diff --git a/src/zc/c_sext_b.adoc b/src/zc/c_sext_b.adoc new file mode 100644 index 0000000..bcf8f15 --- /dev/null +++ b/src/zc/c_sext_b.adoc @@ -0,0 +1,48 @@ +<<< +[#insns-c_sext_b,reftext="Sign extend byte, 16-bit encoding"] +=== c.sext.b + +Synopsis:: +Sign extend byte, 16-bit encoding + +Mnemonic:: +c.sext.b _rd'/rs1'_ + +Encoding (RV32, RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x1, attr: ['C1'] }, + { bits: 3, name: 0x1, attr: ['C.SEXT.B'] }, + { bits: 2, name: 0x3, attr: ['FUNCT2'] }, + { bits: 3, name: 'rd\'/rs1\'', attr: ['SRCDST'] }, + { bits: 3, name: 0x7 }, + { bits: 3, name: 0x4, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +Description:: +This instruction takes a single source/destination operand. +It sign-extends the least-significant byte in the operand to XLEN bits by copying the most-significant bit +in the byte (i.e., bit 7) to all of the more-significant bits. + +[NOTE] + _rd'/rs1'_ is from the standard 8-register set x8-x15. + +Prerequisites:: +Zbb is also required. + +32-bit equivalent:: +<> from Zbb + +[NOTE] + + The SAIL module variable for _rd'/rs1'_ is called _rsdc_. + +Operation:: +[source,sail] +-- +X(rsdc) = EXTS(X(rsdc)[7..0]); +-- + +include::Zcb_footer.adoc[] diff --git a/src/zc/c_sext_h.adoc b/src/zc/c_sext_h.adoc new file mode 100644 index 0000000..82a64db --- /dev/null +++ b/src/zc/c_sext_h.adoc @@ -0,0 +1,49 @@ +<<< +[#insns-c_sext_h,reftext="Sign extend halfword, 16-bit encoding"] +=== c.sext.h + +Synopsis:: +Sign extend halfword, 16-bit encoding + +Mnemonic:: +c.sext.h _rd'/rs1'_ + +Encoding (RV32, RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x1, attr: ['C1'] }, + { bits: 3, name: 0x3, attr: ['C.SEXT.H'] }, + { bits: 2, name: 0x3, attr: ['FUNCT2'] }, + { bits: 3, name: 'rd\'/rs1\'', attr: ['SRCDST'] }, + { bits: 3, name: 0x7 }, + { bits: 3, name: 0x4, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +Description:: +This instruction takes a single source/destination operand. +It sign-extends the least-significant halfword in the operand to XLEN bits by copying the most-significant bit +in the halfword (i.e., bit 15) to all of the more-significant bits. + +[NOTE] + _rd'/rs1'_ is from the standard 8-register set x8-x15. + +Prerequisites:: +Zbb is also required. + +32-bit equivalent:: +<> from Zbb + +[NOTE] + + The SAIL module variable for _rd'/rs1'_ is called _rsdc_. + +Operation:: +[source,sail] +-- +X(rsdc) = EXTS(X(rsdc)[15..0]); +-- + +include::Zcb_footer.adoc[] + diff --git a/src/zc/c_sh.adoc b/src/zc/c_sh.adoc new file mode 100644 index 0000000..977a887 --- /dev/null +++ b/src/zc/c_sh.adoc @@ -0,0 +1,48 @@ +<<< +[#insns-c_sh,reftext="Store halfword, 16-bit encoding"] +=== c.sh + +Synopsis:: +Store halfword, 16-bit encoding + +Mnemonic:: +c.sh _rs2'_, _uimm_(_rs1'_) + +Encoding (RV32, RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x0, attr: ['C0'] }, + { bits: 3, name: 'rs2\'' }, + { bits: 1, name: 'uimm[1]' }, + { bits: 1, name: '0' }, + { bits: 3, name: 'rs1\'' }, + { bits: 3, name: 0x3 }, + { bits: 3, name: 0x4, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +include::c_lhsh_imm_offset.adoc[] + +Description:: +This instruction stores the least significant halfword of _rs2'_ to the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. + +[NOTE] + _rs1'_ and _rs2'_ are from the standard 8-register set x8-x15. + +Prerequisites:: +None + +32-bit equivalent:: +<> + +Operation:: +[source,sail] +-- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +mem[X(rs1c)+EXTZ(uimm)][15..0] = X(rs2c) +-- + +include::Zcb_footer.adoc[] + diff --git a/src/zc/c_zca_required.adoc b/src/zc/c_zca_required.adoc new file mode 100644 index 0000000..f7b460c Binary files /dev/null and b/src/zc/c_zca_required.adoc differ diff --git a/src/zc/c_zext_b.adoc b/src/zc/c_zext_b.adoc new file mode 100644 index 0000000..500461d --- /dev/null +++ b/src/zc/c_zext_b.adoc @@ -0,0 +1,52 @@ +<<< +[#insns-c_zext_b,reftext="Zero extend byte, 16-bit encoding"] +=== c.zext.b + +Synopsis:: +Zero extend byte, 16-bit encoding + +Mnemonic:: +c.zext.b _rd'/rs1'_ + +Encoding (RV32, RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x1, attr: ['C1'] }, + { bits: 3, name: 0x0, attr: ['C.ZEXT.B'] }, + { bits: 2, name: 0x3, attr: ['FUNCT2'] }, + { bits: 3, name: 'rd\'/rs1\'', attr: ['SRCDST'] }, + { bits: 3, name: 0x7 }, + { bits: 3, name: 0x4, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +Description:: +This instruction takes a single source/destination operand. +It zero-extends the least-significant byte of the operand to XLEN bits by inserting zeros into all of +the bits more significant than 7. + +[NOTE] + _rd'/rs1'_ is from the standard 8-register set x8-x15. + +Prerequisites:: +None + +32-bit equivalent:: +[source,sail] +-- +andi rd'/rs1', rd'/rs1', 0xff +-- + +[NOTE] + + The SAIL module variable for _rd'/rs1'_ is called _rsdc_. + +Operation:: +[source,sail] +-- +X(rsdc) = EXTZ(X(rsdc)[7..0]); +-- + +include::Zcb_footer.adoc[] + diff --git a/src/zc/c_zext_h.adoc b/src/zc/c_zext_h.adoc new file mode 100644 index 0000000..5999857 --- /dev/null +++ b/src/zc/c_zext_h.adoc @@ -0,0 +1,49 @@ +<<< +[#insns-c_zext_h,reftext="Zero extend halfword, 16-bit encoding"] +=== c.zext.h + +Synopsis:: +Zero extend halfword, 16-bit encoding + +Mnemonic:: +c.zext.h _rd'/rs1'_ + +Encoding (RV32, RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x1, attr: ['C1'] }, + { bits: 3, name: 0x2, attr: ['C.ZEXT.H'] }, + { bits: 2, name: 0x3, attr: ['FUNCT2'] }, + { bits: 3, name: 'rd\'/rs1\'', attr: ['SRCDST'] }, + { bits: 3, name: 0x7 }, + { bits: 3, name: 0x4, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +Description:: +This instruction takes a single source/destination operand. +It zero-extends the least-significant halfword of the operand to XLEN bits by inserting zeros into all of +the bits more significant than 15. + +[NOTE] + _rd'/rs1'_ is from the standard 8-register set x8-x15. + +Prerequisites:: +Zbb is also required. + +32-bit equivalent:: +<> from Zbb + +[NOTE] + + The SAIL module variable for _rd'/rs1'_ is called _rsdc_. + +Operation:: +[source,sail] +-- +X(rsdc) = EXTZ(X(rsdc)[15..0]); +-- + +include::Zcb_footer.adoc[] + diff --git a/src/zc/c_zext_w.adoc b/src/zc/c_zext_w.adoc new file mode 100644 index 0000000..3540405 --- /dev/null +++ b/src/zc/c_zext_w.adoc @@ -0,0 +1,51 @@ +<<< +[#insns-c_zext_w,reftext="Zero extend word, 16-bit encoding"] +=== c.zext.w + +Synopsis:: +Zero extend word, 16-bit encoding + +Mnemonic:: +c.zext.w _rd'/rs1'_ + +Encoding (RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x1, attr: ['C1'] }, + { bits: 3, name: 0x4, attr: ['C.ZEXT.W'] }, + { bits: 2, name: 0x3, attr: ['FUNCT2'] }, + { bits: 3, name: 'rd\'/rs1\'', attr: ['SRCDST'] }, + { bits: 3, name: 0x7 }, + { bits: 3, name: 0x4, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +Description:: +This instruction takes a single source/destination operand. +It zero-extends the least-significant word of the operand to XLEN bits by inserting zeros into all of +the bits more significant than 31. + +[NOTE] + _rd'/rs1'_ is from the standard 8-register set x8-x15. + +Prerequisites:: +Zba is also required. + +32-bit equivalent:: +[source,sail] +-- +add.uw rd'/rs1', rd'/rs1', zero +-- + +[NOTE] + + The SAIL module variable for _rd'/rs1'_ is called _rsdc_. + +Operation:: +[source,sail] +-- +X(rsdc) = EXTZ(X(rsdc)[31..0]); +-- + +include::Zcb_footer.adoc[] diff --git a/src/zc/changes_since_v0.50.adoc b/src/zc/changes_since_v0.50.adoc new file mode 100644 index 0000000..a4452b1 --- /dev/null +++ b/src/zc/changes_since_v0.50.adoc @@ -0,0 +1,130 @@ + +There are many changes since v0.50.1, which has been used for toolchain, spike, qemu and the CV32E41P implementation. + +The status of all of the instructions are in the tables. Note that _all_ subsets have been redefined. + += Load/store + +.Load/store +[options="header",width=100%] +|==================================================================================== +| v0.50 name | v0.70 Name | Encoding changed? | Semantics changed? | Notes +| C.LB | CM.LB | Y | N | uimm < 4 is "custom defined" +| C.LBU | CM.LBU | Y | N | uimm < 4 is "custom defined" +| C.LH | CM.LH | Y | N | uimm < 4 is "custom defined" +| C.LHU | CM.LHU | Y | N | uimm < 4 is "custom defined" +| C.SB | CM.SB | Y | N | uimm < 4 is "custom defined" +| C.SH | CM.SH | Y | N | uimm < 4 is "custom defined" +| N/A | C.LBU | N/A | N/A | CM.LBU with shorter uimm +| N/A | C.LH | N/A | N/A | CM.LH with shorter uimm +| N/A | C.LHU | N/A | N/A | CM.LHU with shorter uimm +| N/A | C.SB | N/A | N/A | CM.SB with shorter uimm +| N/A | C.SH | N/A | N/A | CM.SH with shorter uimm +|==================================================================================== + += Table jump + +.Table Jump +[options="header",width=100%] +|==================================================================================== +| v0.50 name | v0.70 Name | Encoding changed? | Semantics changed? | Notes +| C.TBLJAL | CM.JALT | Y | Y - exception model| Meaning of table index changed in the encoding, # removed from assembly syntax +| C.TBLJ | CM.J | Y | Y - exception model| Meaning of table index changed in the encoding, # removed from assembly syntax +| C.TBLJALM | N/A | N/A | N/A | Deleted +|==================================================================================== + +See this [commit](https://github.com/riscv/riscv-code-size-reduction/commit/8ba5b0fdf05d6fd5af118ba5301910d049abd1a8#diff-8d03bd23cf9ec0eb75984f7c6d4181aa9548acb5898dc9159514e24398076836) for the change in the table jump exception model. + += Double move + +.Double move +[options="header",width=100%] +|==================================================================================== +| v0.50 name | v0.70 Name | Encoding changed? | Semantics changed? | Notes +| C.MVA01S07 | CM.MVA01S | Y | N | +| N/A | CM.MVSA01 | N/A | N/A | New instruction +|==================================================================================== + +Note that the .E extension versions for the EABI will be specified in the future, and cannot yet be confirmed as the EABI is not frozen. + += Simple instructions + +.Simple instructions +[options="header",width=100%] +|==================================================================================== +| v0.50 name | v0.70 Name | Encoding changed? | Semantics changed? | Notes +| C.ZEXT.B | same | Y | N | +| C.ZEXT.H | same | Y | N | +| C.SEXT.B | same | Y | N | +| C.SEXT.H | same | Y | N | +| C.SEXT.W | same | Y | N | +| C.NOT | same | Y | N | +| C.MUL | same | N | N | unchanged +|==================================================================================== + += Push/pop + +All 32-bit forms are removed and all the 16-bit forms support 12 register lists (excluding {ra, s0-s10}): + +. {ra} +. {ra, s0} +. {ra, s0-s1} +. {ra, s0-s2} +. {ra, s0-s3} +. {ra, s0-s4} +. {ra, s0-s5} +. {ra, s0-s6} +. {ra, s0-s7} +. {ra, s0-s8} +. {ra, s0-s9} +. {ra, s0-s11} + +spimm length also updated. + +Note that the .E extension versions for the EABI will be specified in the future, and cannot yet be confirmed as the EABI is not frozen. + +.Push/pop instructions +[options="header",width=100%] +|==================================================================================== +| v0.50 name | v0.70 Name | Encoding changed? | Semantics changed? | Notes +| C.PUSH | CM.PUSH | Y | Y | areg_list no longer supported +| C.POP | CM.POP | Y | Y | +| C.POPRET | CM.POPRET | Y | Y | CM.POPRET doesn't return a value +| C.POPRET | CM.POPRETZ | Y | Y | separate encoding for return zero +|==================================================================================== + += Instructions in v0.50 but *not* in v0.70 + +These instructions can be left in the compiler as experimental, enabled with the following switches: + +[#compilerswitches] +.Compiler switches experimental instructions +[options="header",width=100%] +|============================================================================== +| Switch | Enabled instructions +| -mzce-lsgp | LWGP, SWGP, LDGP (RV64), SDGP (RV64) +| -mzce-muli | MULI +| -mzce-beqi | BEQI +| -mzce-bnei | BNEI +| -mzce-cdecbnez | C.DECBNEZ +| -mzce-decbnez | DECBNEZ +|============================================================================== + +== 16-bit Instructions + +C.DECBNEZ - the encoding space for this has been used by all the CM.* instructions. +Therefore this instruction must be disabled in the compiler - unless an encoding is proposed. + +C.NEG - this is not very useful and can be deleted. + +== 32-bit Instructions + +MULI - This is in custom-0, so can be kept unchanged. Early benchmarking results suggest it's not much use, and the encoding is expensive so it's unlikely to ever be included in an extension. + +BEQI, BNEI - these fill in the 2 gaps in the BRANCH encoding group - these encodings have not been allocated to other instructions, so these can stay unchanged + +DECBNEZ - this should be updated to match https://github.com/riscv/riscv-code-size-reduction/blob/master/Zce-release-candidate/Zcmd.pdf + +LWGP, SWGP, LDGP, SDGP - these overlap with C.FLD, C.FSD + +PUSH/POP/POPRET - delete all of these diff --git a/src/zc/cm_decbnez.adoc b/src/zc/cm_decbnez.adoc new file mode 100644 index 0000000..912b768 --- /dev/null +++ b/src/zc/cm_decbnez.adoc @@ -0,0 +1,50 @@ +<<< +[#insns-cm_decbnez,reftext="Decrement and branch, 16-bit encoding"] +=== cm.decbnez: This is in the _development_ phase, for benchmarking and prototyping only + +Synopsis:: +Decrement and branch, 16-bit encoding + +Mnemonic:: +cm.decbnez _t0_, _offset_ + +Encoding (RV32, RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x2, attr: ['C2'] }, + { bits: 6, name: 'imm[6|7|3:1|5]', attr: [] }, + { bits: 1, name: 0x1, attr: [] }, + { bits: 3, name: 'imm[4|9:8]', attr: [] }, + { bits: 1, name: 0x1, attr: [] }, + { bits: 3, name: 0x5, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +[NOTE] + + In the current proposal only t0 can be decremented, future versions may allow more registers + +Description:: +This instruction decrements _t0_, and increments the PC by the sign extended immediate if _t0_ is zero *after* the decrement. + +Prerequisites:: +C or Zca + +32-bit equivalent:: +None + +Operation:: +[source,sail] +-- + +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +t0 = 5; +X(t0) = X(t0)-1; +if (X(t0)==0) PC+=sext(imm); else PC+=2; + +-- + +include::Zcmd_footer.adoc[] + diff --git a/src/zc/cm_jalt.adoc b/src/zc/cm_jalt.adoc new file mode 100644 index 0000000..372d933 --- /dev/null +++ b/src/zc/cm_jalt.adoc @@ -0,0 +1,74 @@ +<<< +[#insns-cm_jalt,reftext="Jump and link via table"] +=== cm.jalt + +Synopsis:: +jump via table with optional link + +Mnemonic:: +cm.jalt _index_ + +Encoding (RV32, RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x2, attr: ['C2'] }, + { bits: 8, name: 'index', attr: [] }, + { bits: 3, name: 0x0, attr: [] }, + { bits: 3, name: 0x5, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +[NOTE] + + For this encoding to decode as _cm.jalt_, _index>=32_, otherwise it decodes as _cm.jt_, see <>. + +[NOTE] + + If JVT.mode = 0 (Jump Table Mode) then _cm.jalt_ behaves as specified here. If JVT.mode is a reserved value, then _cm.jalt_ is also reserved. In the future other defined values of JVT.mode may change the behaviour of _cm.jalt_. + +Assembly Syntax:: + +[source,sail] +-- +cm.jalt index +-- + +Description:: + +_cm.jalt_ reads an entry from the jump vector table in memory and jumps to the address that was read, linking to _ra_. + +For further information see <>. + +Prerequisites:: +None + +32-bit equivalent:: +No direct equivalent encoding exists. + +<<< + +[#insns-cm_jalt-SAIL,reftext="cm.jalt SAIL code"] +Operation:: + +[source,sail] +-- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +# target_address is temporary internal state, it doesn't represent a real register +# InstMemory is byte indexed + +switch(XLEN) { + 32: table_address[XLEN-1:0] = JVT.base + (index<<2); + 64: table_address[XLEN-1:0] = JVT.base + (index<<3); +} + +//fetch from the jump table +target_address[XLEN-1:0] = InstMemory[table_address][XLEN-1:0]; + +jal ra, target_address[XLEN-1:0]&~0x1; + +-- + +include::Zcmt_footer.adoc[] + diff --git a/src/zc/cm_jt.adoc b/src/zc/cm_jt.adoc new file mode 100644 index 0000000..8c7f67d --- /dev/null +++ b/src/zc/cm_jt.adoc @@ -0,0 +1,74 @@ +<<< +[#insns-cm_jt,reftext="Jump via table"] +=== cm.jt + +Synopsis:: +jump via table + +Mnemonic:: +cm.jt _index_ + +Encoding (RV32, RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x2, attr: ['C2'] }, + { bits: 8, name: 'index', attr: [] }, + { bits: 3, name: 0x0, attr: [] }, + { bits: 3, name: 0x5, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +[NOTE] + + For this encoding to decode as _cm.jt_, _index<32_, otherwise it decodes as _cm.jalt_, see <>. + +[NOTE] + + If JVT.mode = 0 (Jump Table Mode) then _cm.jt_ behaves as specified here. If JVT.mode is a reserved value, then _cm.jt_ is also reserved. In the future other defined values of JVT.mode may change the behaviour of _cm.jt_. + +Assembly Syntax:: + +[source,sail] +-- +cm.jt index +-- + +Description:: + +_cm.jt_ reads an entry from the jump vector table in memory and jumps to the address that was read. + +For further information see <>. + +Prerequisites:: +None + +32-bit equivalent:: +No direct equivalent encoding exists. + +<<< + +[#insns-cm_jt-SAIL,reftext="cm.jt SAIL code"] +Operation:: + +[source,sail] +-- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +# target_address is temporary internal state, it doesn't represent a real register +# InstMemory is byte indexed + +switch(XLEN) { + 32: table_address[XLEN-1:0] = JVT.base + (index<<2); + 64: table_address[XLEN-1:0] = JVT.base + (index<<3); +} + +//fetch from the jump table +target_address[XLEN-1:0] = InstMemory[table_address][XLEN-1:0]; + +j target_address[XLEN-1:0]&~0x1; + +-- + +include::Zcmt_footer.adoc[] + diff --git a/src/zc/cm_lb.adoc b/src/zc/cm_lb.adoc new file mode 100644 index 0000000..525ba97 --- /dev/null +++ b/src/zc/cm_lb.adoc @@ -0,0 +1,47 @@ +<<< +[#insns-cm_lb,reftext="Load signed byte, 16-bit encoding"] +=== cm.lb + +Synopsis:: +Load signed byte, 16-bit encoding + +Mnemonic:: +cm.lb _rd'_, _uimm_(_rs1'_) + +Encoding (RV32, RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x0, attr: ['C0'] }, + { bits: 3, name: 'rd\'' }, + { bits: 2, name: 'uimm[2:1]' }, + { bits: 3, name: 'rs1\'' }, + { bits: 2, name: 'uimm[0|3]' }, + { bits: 1, name: 0x0 }, + { bits: 3, name: 0x1, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +include::cm_lbsb_imm_offset.adoc[] + +Description:: +This instruction loads a byte from the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. The resulting byte is sign extended to XLEN bits and is written to _rd'_. + +[NOTE] + _rd'_ and _rs1'_ are from the standard 8-register set x8-x15. + +Prerequisites:: +None + +32-bit equivalent:: +<> + +Operation:: +[source,sail] +-- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +X(rdc) = EXTS(mem[X(rs1c)+EXTZ(uimm)][7..0]); +-- + +include::Zcmb_footer.adoc[] diff --git a/src/zc/cm_lbsb_imm_offset.adoc b/src/zc/cm_lbsb_imm_offset.adoc new file mode 100644 index 0000000..4df7702 --- /dev/null +++ b/src/zc/cm_lbsb_imm_offset.adoc @@ -0,0 +1,9 @@ + +The immediate offset is formed as follows: +[source,sail] +-- + uimm[31:4] = 0; + uimm[3] = encoding[10]; + uimm[2:1] = encoding[6:5]; + uimm[0] = encoding[11]; +-- diff --git a/src/zc/cm_lbu.adoc b/src/zc/cm_lbu.adoc new file mode 100644 index 0000000..7e9735f --- /dev/null +++ b/src/zc/cm_lbu.adoc @@ -0,0 +1,50 @@ +<<< +[#insns-cm_lbu,reftext="Load unsigned byte, 16-bit encoding"] +=== cm.lbu + +Synopsis:: +Load unsigned byte, 16-bit encoding + +Mnemonic:: +cm.lbu _rd'_, _uimm_(_rs1'_) + +Encoding (RV32, RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x2, attr: ['C2'] }, + { bits: 3, name: 'rd\'' }, + { bits: 2, name: 'uimm[2:1]' }, + { bits: 3, name: 'rs1\'' }, + { bits: 2, name: 'uimm[0|3]' }, + { bits: 1, name: 0x0 }, + { bits: 3, name: 0x1, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +[NOTE] + If _uimm < 4_ the encoding is designated for custom use, as the functionality overlaps with <>. + +include::cm_lbsb_imm_offset.adoc[] + +Description:: +This instruction loads a byte from the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. The resulting byte is zero extended to XLEN bits and is written to _rd'_. + +[NOTE] + _rd'_ and _rs1'_ are from the standard 8-register set x8-x15. + +Prerequisites:: +None + +32-bit equivalent:: +<> + +Operation:: +[source,sail] +-- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +X(rdc) = EXTZ(mem[X(rs1c)+EXTZ(uimm)][7..0]); +-- + +include::Zcmb_footer.adoc[] diff --git a/src/zc/cm_lh.adoc b/src/zc/cm_lh.adoc new file mode 100644 index 0000000..bb1b6b9 --- /dev/null +++ b/src/zc/cm_lh.adoc @@ -0,0 +1,51 @@ +<<< +[#insns-cm_lh,reftext="Load signed halfword, 16-bit encoding"] +=== cm.lh + +Synopsis:: +Load signed halfword, 16-bit encoding + +Mnemonic:: +cm.lh _rd'_, _uimm_(_rs1'_) + +Encoding (RV32, RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x0, attr: ['C0'] }, + { bits: 3, name: 'rd\'' }, + { bits: 2, name: 'uimm[2:1]' }, + { bits: 3, name: 'rs1\'' }, + { bits: 2, name: 'uimm[4:3]' }, + { bits: 1, name: 0x1 }, + { bits: 3, name: 0x1, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +[NOTE] + If _uimm < 4_ the encoding is designated for custom use, as the functionality overlaps with <>. + +include::cm_lhsh_imm_offset.adoc[] + +Description:: +This instruction loads a halfword from the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. The resulting halfword is sign extended to XLEN bits and is written to _rd'_. + +[NOTE] + _rd'_ and _rs1'_ are from the standard 8-register set x8-x15. + +Prerequisites:: +None + +32-bit equivalent:: +<> + +Operation:: +[source,sail] +-- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +X(rdc) = EXTS(load_mem[X(rs1c)+EXTZ(uimm)][15..0]); +-- + +include::Zcmb_footer.adoc[] + diff --git a/src/zc/cm_lhsh_imm_offset.adoc b/src/zc/cm_lhsh_imm_offset.adoc new file mode 100644 index 0000000..1aa6bc8 --- /dev/null +++ b/src/zc/cm_lhsh_imm_offset.adoc @@ -0,0 +1,9 @@ + +The immediate offset is formed as follows: +[source,sail] +-- + uimm[31:5] = 0; + uimm[4:3] = encoding[11:10]; + uimm[2:1] = encoding[6:5]; + uimm[0] = 0; +-- diff --git a/src/zc/cm_lhu.adoc b/src/zc/cm_lhu.adoc new file mode 100644 index 0000000..3a3c281 --- /dev/null +++ b/src/zc/cm_lhu.adoc @@ -0,0 +1,51 @@ +<<< +[#insns-cm_lhu,reftext="Load unsigned halfword, 16-bit encoding"] +=== cm.lhu + +Synopsis:: +Load unsigned halfword, 16-bit encoding + +Mnemonic:: +cm.lhu _rd'_, _uimm_(_rs1'_) + +Encoding (RV32, RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x2, attr: ['C2'] }, + { bits: 3, name: 'rd\'' }, + { bits: 2, name: 'uimm[2:1]' }, + { bits: 3, name: 'rs1\'' }, + { bits: 2, name: 'uimm[4:3]' }, + { bits: 1, name: 0x1 }, + { bits: 3, name: 0x1, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +[NOTE] + If _uimm < 4_ the encoding is designated for custom use, as the functionality overlaps with <>. + +include::cm_lhsh_imm_offset.adoc[] + +Description:: +This instruction loads a halfword from the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. The resulting halfword is zero extended to XLEN bits and is written to _rd'_. + +[NOTE] + _rd'_ and _rs1'_ are from the standard 8-register set x8-x15. + +Prerequisites:: +None + +32-bit equivalent:: +<> + +Operation:: +[source,sail] +-- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +X(rdc) = EXTZ(load_mem[X(rs1c)+EXTZ(uimm)][15..0]); +-- + +include::Zcmb_footer.adoc[] + diff --git a/src/zc/cm_mva01s.adoc b/src/zc/cm_mva01s.adoc new file mode 100644 index 0000000..9d36688 --- /dev/null +++ b/src/zc/cm_mva01s.adoc @@ -0,0 +1,62 @@ +<<< +[#insns-cm_mva01s,reftext="Move two s0-s7 registers into a0-a1"] +=== cm.mva01s + +Synopsis:: +Move two s0-s7 registers into a0-a1 + +Mnemonic:: +cm.mva01s _r1s'_, _r2s'_ + +Encoding (RV32, RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x2, attr: ['C2'] }, + { bits: 3, name: 'r2s\'', attr: [] }, + { bits: 2, name: 0x3, attr: [] }, + { bits: 3, name: 'r1s\'', attr: [] }, + { bits: 3, name: 0x3, attr: [] }, + { bits: 3, name: 0x5, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +Assembly Syntax:: + +[source,sail] +-- +cm.mva01s r1s', r2s' +-- + +Description:: +This instruction moves _r1s'_ into _a0_ and _r2s'_ into _a1_. +The execution is atomic, so it is not possible to observe state where only one of _a0_ or _a1_ have been updated. + +The encoding uses _sreg_ number specifiers instead of _xreg_ number specifiers to save encoding space. +The mapping between them is specified in the pseudo-code below. + +[NOTE] + + The _s_ register mapping is taken from the UABI, and may not match the currently unratified EABI. _cm.mva01s.e_ may be included in the future. + +Prerequisites:: +None + +32-bit equivalent:: +No direct equivalent encoding exists. + +Operation:: +[source,sail] +-- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. +if (RV32E && (r1sc>1 || r2sc>1)) { + reserved(); +} +xreg1 = {r1sc[2:1]>0,r1sc[2:1]==0,r1sc[2:0]}; +xreg2 = {r2sc[2:1]>0,r2sc[2:1]==0,r2sc[2:0]}; +X[10] = X[xreg1]; +X[11] = X[xreg2]; +-- + +include::Zcmp_footer.adoc[] + diff --git a/src/zc/cm_mvsa01.adoc b/src/zc/cm_mvsa01.adoc new file mode 100644 index 0000000..fd59c85 --- /dev/null +++ b/src/zc/cm_mvsa01.adoc @@ -0,0 +1,65 @@ +<<< +[#insns-cm_mvsa01,reftext="Move a0-a1 into two different s0-s7 registers"] +=== cm.mvsa01 + +Synopsis:: +Move a0-a1 into two registers of s0-s7 + +Mnemonic:: +cm.mvsa01 _r1s'_, _r2s'_ + +Encoding (RV32, RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x2, attr: ['C2'] }, + { bits: 3, name: 'r2s\'', attr: [] }, + { bits: 2, name: 0x1, attr: [] }, + { bits: 3, name: 'r1s\'', attr: [] }, + { bits: 3, name: 0x3, attr: [] }, + { bits: 3, name: 0x5, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +[NOTE] + For the encoding to be legal _r1s'_ != _r2s'_. + +Assembly Syntax:: + +[source,sail] +-- +cm.mvsa01 r1s', r2s' +-- + +Description:: +This instruction moves _a0_ into _r1s'_ and _a1_ into _r2s'_. _r1s'_ and _r2s'_ must be different. +The execution is atomic, so it is not possible to observe state where only one of _r1s'_ or _r2s'_ has been updated. + +The encoding uses _sreg_ number specifiers instead of _xreg_ number specifiers to save encoding space. +The mapping between them is specified in the pseudo-code below. + +[NOTE] + + The _s_ register mapping is taken from the UABI, and may not match the currently unratified EABI. _cm.mvsa01.e_ may be included in the future. + +Prerequisites:: +None + +32-bit equivalent:: +No direct equivalent encoding exists. + +Operation:: +[source,sail] +-- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. +if (RV32E && (r1sc>1 || r2sc>1)) { + reserved(); +} +xreg1 = {r1sc[2:1]>0,r1sc[2:1]==0,r1sc[2:0]}; +xreg2 = {r2sc[2:1]>0,r2sc[2:1]==0,r2sc[2:0]}; +X[xreg1] = X[10]; +X[xreg2] = X[11]; +-- + +include::Zcmp_footer.adoc[] + diff --git a/src/zc/cm_pop.adoc b/src/zc/cm_pop.adoc new file mode 100644 index 0000000..30e097e --- /dev/null +++ b/src/zc/cm_pop.adoc @@ -0,0 +1,49 @@ +<<< +[#insns-cm_pop,reftext="Pop registers, deallocate stack frame."] +=== cm.pop + +Synopsis:: +Destroy stack frame: load ra and 0 to 12 saved registers from the stack frame, deallocate the stack frame. + +Mnemonic:: +cm.pop _{reg_list}, stack_adj_ + +Encoding (RV32, RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x2, attr: ['C2'] }, + { bits: 2, name: 'spimm\[5:4\]', attr: [] }, + { bits: 4, name: 'rlist', attr: [] }, + { bits: 5, name: 0x1a, attr: [] }, + { bits: 3, name: 0x5, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +[NOTE] + + _rlist_ values 0 to 3 are reserved for a future EABI variant called _cm.pop.e_ + +Assembly Syntax:: + +[source,sail] +-- +cm.pop {reg_list}, stack_adj +cm.pop {xreg_list}, stack_adj +-- + +include::variable_def.adoc[] +include::pushpop_vars.adoc[] + +<<< + +Description:: +This instruction pops (loads) the registers in _reg_list_ from stack memory, +and then adjusts the stack pointer by _stack_adj_. + +include::pushpop_extra_info.adoc[] +include::cm_pop_popret_loads_pseudo_code.adoc[] +include::cm_pop_pseudo_code.adoc[] + +include::Zcmp_footer.adoc[] + diff --git a/src/zc/cm_pop_popret_loads_pseudo_code.adoc b/src/zc/cm_pop_popret_loads_pseudo_code.adoc new file mode 100644 index 0000000..af46b9d --- /dev/null +++ b/src/zc/cm_pop_popret_loads_pseudo_code.adoc @@ -0,0 +1,25 @@ + +Operation:: + +The first section of pseudo-code may be executed multiple times before the instruction successfully completes. + +[source,sail] +-- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +if (XLEN==32) bytes=4; else bytes=8; + +addr=sp+stack_adj-bytes; +for(i in 27,26,25,24,23,22,21,20,19,18,9,8,1) { + //if register i is in xreg_list + if (xreg_list[i]) { + switch(bytes) { + 4: asm("lw x[i], 0(addr)"); + 8: asm("ld x[i], 0(addr)"); + } + addr-=bytes; + } +} +-- + +The final section of pseudo-code executes atomically, and only executes if the section above completes without any exceptions or interrupts. diff --git a/src/zc/cm_pop_pseudo_code.adoc b/src/zc/cm_pop_pseudo_code.adoc new file mode 100644 index 0000000..0cd38a0 --- /dev/null +++ b/src/zc/cm_pop_pseudo_code.adoc @@ -0,0 +1,7 @@ + +[source,sail] +-- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +sp+=stack_adj; +-- diff --git a/src/zc/cm_popret.adoc b/src/zc/cm_popret.adoc new file mode 100644 index 0000000..1150203 --- /dev/null +++ b/src/zc/cm_popret.adoc @@ -0,0 +1,49 @@ +<<< +[#insns-cm_popret,reftext="Pop registers, deallocate stack frame, return."] +=== cm.popret + +Synopsis:: +Destroy stack frame: load ra and 0 to 12 saved registers from the stack frame, deallocate the stack frame, return to ra. + +Mnemonic:: +cm.popret _{reg_list}, stack_adj_ + +Encoding (RV32, RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x2, attr: ['C2'] }, + { bits: 2, name: 'spimm\[5:4\]', attr: [] }, + { bits: 4, name: 'rlist', attr: [] }, + { bits: 5, name: 0x1e, attr: [] }, + { bits: 3, name: 0x5, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +[NOTE] + + _rlist_ values 0 to 3 are reserved for a future EABI variant called _cm.popret.e_ + +Assembly Syntax:: + +[source,sail] +-- +cm.popret {reg_list}, stack_adj +cm.popret {xreg_list}, stack_adj +-- + +include::variable_def.adoc[] +include::pushpop_vars.adoc[] + +<<< + +Description:: +This instruction pops (loads) the registers in _reg_list_ from stack memory, + adjusts the stack pointer by _stack_adj_ and then returns to _ra_. + +include::pushpop_extra_info.adoc[] +include::cm_pop_popret_loads_pseudo_code.adoc[] +include::cm_popret_pseudo_code.adoc[] + +include::Zcmp_footer.adoc[] + diff --git a/src/zc/cm_popret_pseudo_code.adoc b/src/zc/cm_popret_pseudo_code.adoc new file mode 100644 index 0000000..ecf60f2 --- /dev/null +++ b/src/zc/cm_popret_pseudo_code.adoc @@ -0,0 +1,9 @@ + + +[source,sail] +-- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +sp+=stack_adj; +asm("ret"); +-- diff --git a/src/zc/cm_popretz.adoc b/src/zc/cm_popretz.adoc new file mode 100644 index 0000000..10ccf35 --- /dev/null +++ b/src/zc/cm_popretz.adoc @@ -0,0 +1,49 @@ +<<< +[#insns-cm_popretz,reftext="Pop registers, deallocate stack frame, return zero."] +=== cm.popretz + +Synopsis:: +Destroy stack frame: load ra and 0 to 12 saved registers from the stack frame, deallocate the stack frame, move zero into a0, return to ra. + +Mnemonic:: +cm.popretz _{reg_list}, stack_adj_ + +Encoding (RV32, RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x2, attr: ['C2'] }, + { bits: 2, name: 'spimm\[5:4\]', attr: [] }, + { bits: 4, name: 'rlist', attr: [] }, + { bits: 5, name: 0x1c, attr: [] }, + { bits: 3, name: 0x5, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +[NOTE] + + _rlist_ values 0 to 3 are reserved for a future EABI variant called _cm.popretz.e_ + + +Assembly Syntax:: + +[source,sail] +-- +cm.popretz {reg_list}, stack_adj +cm.popretz {xreg_list}, stack_adj +-- + +include::pushpop_vars.adoc[] + +<<< + +Description:: +This instruction pops (loads) the registers in _reg_list_ from stack memory, + adjusts the stack pointer by _stack_adj_, moves zero into a0 and then returns to _ra_. + +include::pushpop_extra_info.adoc[] +include::cm_pop_popret_loads_pseudo_code.adoc[] +include::cm_popretz_pseudo_code.adoc[] + +include::Zcmp_footer.adoc[] + diff --git a/src/zc/cm_popretz_pseudo_code.adoc b/src/zc/cm_popretz_pseudo_code.adoc new file mode 100644 index 0000000..6aac95c --- /dev/null +++ b/src/zc/cm_popretz_pseudo_code.adoc @@ -0,0 +1,14 @@ + + +[NOTE] + + The _li a0, 0_ *could* be executed more than once, but is included in the atomic section for convenience. + +[source,sail] +-- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +asm("li a0, 0"); +sp+=stack_adj; +asm("ret"); +-- diff --git a/src/zc/cm_push.adoc b/src/zc/cm_push.adoc new file mode 100644 index 0000000..77f1fde --- /dev/null +++ b/src/zc/cm_push.adoc @@ -0,0 +1,48 @@ +<<< +[#insns-cm_push,reftext="Create stack frame: push registers, allocate additional stack space."] +=== cm.push + +Synopsis:: +Create stack frame: store ra and 0 to 12 saved registers to the stack frame, optionally allocate additional stack space. + +Mnemonic:: +cm.push _{reg_list}, -stack_adj_ + +Encoding (RV32, RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x2, attr: ['C2'] }, + { bits: 2, name: 'spimm\[5:4\]', attr: [] }, + { bits: 4, name: 'rlist', attr: [] }, + { bits: 5, name: 0x18, attr: [] }, + { bits: 3, name: 0x5, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +[NOTE] + + _rlist_ values 0 to 3 are reserved for a future EABI variant called _cm.push.e_ + +Assembly Syntax:: + +[source,sail] +-- +cm.push {reg_list}, -stack_adj +cm.push {xreg_list}, -stack_adj +-- + +include::variable_def.adoc[] +include::pushpop_vars.adoc[] + +<<< +Description:: +This instruction pushes (stores) the registers in _reg_list_ to the memory below the stack pointer, +and then creates the stack frame by decrementing the stack pointer by _stack_adj_, +including any additional stack space requested by the value of _spimm_. + +include::pushpop_extra_info.adoc[] +include::cm_push_stores_pseudo_code.adoc[] +include::cm_push_pseudo_code.adoc[] + +include::Zcmp_footer.adoc[] diff --git a/src/zc/cm_push_pseudo_code.adoc b/src/zc/cm_push_pseudo_code.adoc new file mode 100644 index 0000000..8500f0e --- /dev/null +++ b/src/zc/cm_push_pseudo_code.adoc @@ -0,0 +1,7 @@ + +[source,sail] +-- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +sp-=stack_adj; +-- diff --git a/src/zc/cm_push_stores_pseudo_code.adoc b/src/zc/cm_push_stores_pseudo_code.adoc new file mode 100644 index 0000000..46771dd --- /dev/null +++ b/src/zc/cm_push_stores_pseudo_code.adoc @@ -0,0 +1,25 @@ + +Operation:: + +The first section of pseudo-code may be executed multiple times before the instruction successfully completes. + +[source,sail] +-- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +if (XLEN==32) bytes=4; else bytes=8; + +addr=sp-bytes; +for(i in 27,26,25,24,23,22,21,20,19,18,9,8,1) { + //if register i is in xreg_list + if (xreg_list[i]) { + switch(bytes) { + 4: asm("sw x[i], 0(addr)"); + 8: asm("sd x[i], 0(addr)"); + } + addr-=bytes; + } +} +-- + +The final section of pseudo-code executes atomically, and only executes if the section above completes without any exceptions or interrupts. diff --git a/src/zc/cm_sb.adoc b/src/zc/cm_sb.adoc new file mode 100644 index 0000000..265d039 --- /dev/null +++ b/src/zc/cm_sb.adoc @@ -0,0 +1,50 @@ +<<< +[#insns-cm_sb,reftext="Store byte, 16-bit encoding"] +=== cm.sb + +Synopsis:: +Store byte, 16-bit encoding + +Mnemonic:: +cm.sb _rs2'_, _uimm_(_rs1'_) + +Encoding (RV32, RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x0, attr: ['C0'] }, + { bits: 3, name: 'rs2\'' }, + { bits: 2, name: 'uimm[2:1]' }, + { bits: 3, name: 'rs1\'' }, + { bits: 2, name: 'uimm[0|3]' }, + { bits: 1, name: 0x0 }, + { bits: 3, name: 0x5, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +[NOTE] + If _uimm < 4_ the encoding is designated for custom use, as the functionality overlaps with <>. + +include::cm_lbsb_imm_offset.adoc[] + +Description:: +This instruction stores the least significant byte of _rs2'_ to the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. + +[NOTE] + _rs1'_ and _rs2'_ are from the standard 8-register set x8-x15. + +Prerequisites:: +None + +32-bit equivalent:: +<> + +Operation:: +[source,sail] +-- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +mem[X(rs1c)+EXTZ(uimm)][7..0] = X(rs2c) +-- + +include::Zcmb_footer.adoc[] diff --git a/src/zc/cm_sh.adoc b/src/zc/cm_sh.adoc new file mode 100644 index 0000000..fb5e538 --- /dev/null +++ b/src/zc/cm_sh.adoc @@ -0,0 +1,51 @@ +<<< +[#insns-cm_sh,reftext="Store halfword, 16-bit encoding"] +=== cm.sh + +Synopsis:: +Store halfword, 16-bit encoding + +Mnemonic:: +cm.sh _rs2'_, _uimm_(_rs1'_) + +Encoding (RV32, RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x0, attr: ['C0'] }, + { bits: 3, name: 'rs2\'' }, + { bits: 2, name: 'uimm[2:1]' }, + { bits: 3, name: 'rs1\'' }, + { bits: 2, name: 'uimm[4:3]' }, + { bits: 1, name: 0x1 }, + { bits: 3, name: 0x5, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +[NOTE] + If _uimm < 4_ the encoding is designated for custom use, as the functionality overlaps with <>. + +include::cm_lhsh_imm_offset.adoc[] + +Description:: +This instruction stores the least significant halfword of _rs2'_ to the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. + +[NOTE] + _rs1'_ and _rs2'_ are from the standard 8-register set x8-x15. + +Prerequisites:: +None + +32-bit equivalent:: +<> + +Operation:: +[source,sail] +-- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +mem[X(rs1c)+EXTZ(uimm)][15..0] = X(rs2c) +-- + +include::Zcmb_footer.adoc[] + diff --git a/src/zc/example.bib b/src/zc/example.bib new file mode 100644 index 0000000..dd4ca0b --- /dev/null +++ b/src/zc/example.bib @@ -0,0 +1,40 @@ +@inproceedings{riscI-isca1981, + title = {{RISC I}: {A} Reduced Instruction Set {VLSI} Computer}, + author = {David A. Patterson and Carlo H. S\'{e}quin}, + booktitle = {ISCA}, + location = {Minneapolis, Minnesota, USA}, + pages = {443-458}, + year = {1981} +} + +@InProceedings{Katevenis:1983, + author = {Katevenis, Manolis G.H. and Sherburne,Jr., Robert W. and Patterson, David A. and S{\'e}quin, Carlo H.}, + title = {The {RISC II} micro-architecture}, + booktitle = {Proceedings VLSI 83 Conference}, + year = 1983, + month = {August}} + +@inproceedings{Ungar:1984, + author = {David Ungar and Ricki Blau and Peter Foley and Dain Samples + and David Patterson}, + title = {Architecture of {SOAR}: {Smalltalk} on a {RISC}}, + booktitle = {ISCA}, + address = {Ann Arbor, MI}, + year = {1984}, + pages = {188--197} +} + +@Article{spur-jsscc1989, + author = {David D. Lee and Shing I. Kong and Mark D. Hill and + George S. Taylor and David A. Hodges and Randy + H. Katz and David A. Patterson}, + title = {A {VLSI} Chip Set for a Multiprocessor + Workstation--{Part I}: An {RISC} Microprocessor with + Coprocessor Interface and Support for Symbolic + Processing}, + journal = {IEEE JSSC}, + year = 1989, + volume = 24, + number = 6, + pages = {1688--1698}, + month = {December}} diff --git a/src/zc/jvt_csr.adoc b/src/zc/jvt_csr.adoc new file mode 100644 index 0000000..9ad2367 --- /dev/null +++ b/src/zc/jvt_csr.adoc @@ -0,0 +1,65 @@ +<<< +[#csrs-jvt,reftext="JVT CSR, table jump base vector and control register"] +=== JVT CSR + +Synopsis:: +Table jump base vector and control register + +Address:: +0x0017 + +Permissions:: +URW + +Format (RV32):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 6, name: 'mode', attr: ['6'] }, + { bits: 26, name: 'base[XLEN-1:6] (WARL)', attr: ['XLEN-6'] }, +],config:{bits:32}} +.... + +Format (RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 6, name: 'mode', attr: ['6'] }, + { bits: 58, name: 'base[XLEN-1:6] (WARL)', attr: ['XLEN-6'] }, +],config:{bits:64}} +.... + +Description:: + +The _JVT_ register is an XLEN-bit *WARL* read/write register that holds the jump table configuration, consisting of the jump table base address (BASE) and the jump table mode (MODE). + +If <> is implemented then _JVT_ must also be implemented, but can contain a read-only value. If _JVT_ is writable, the set of values the register may hold can vary by implementation. The value in the BASE field must always be aligned on a 64-byte boundary. + +_JVT.base_ is a virtual address, whenever virtual memory is enabled. + +The memory pointed to by _JVT.base_ is treated as instruction memory for the purpose of executing table jump instructions, implying execute access permission. + +[#JVT-config-table] +._JVT.mode_ definition +[width="60%",options=header] +|============================================================================================= +| JVT.mode | Comment +| 000000 | Jump table mode +| others | *reserved for future standard use* +|============================================================================================= + +_JVT.mode_ is a *WARL* field, so can only be programmed to modes which are implemented. Therefore the discovery mechanism is to +attempt to program different modes and read back the values to see which are available. Jump table mode _must_ be implemented. + +NOTE: in future the RISC-V Unified Discovery method will report the available modes. + +Architectural State:: + +_JVT_ adds architectural state to the system software context (such as an OS process), therefore must be saved/restored on context switches. + +State Enable:: + +If the Smstateen extension is implemented, then bit 2 in _mstateen0_, _sstateen0_, and _hstateen0_ is implemented. If bit 2 of a controlling _stateen0_ CSR is zero, then access to the _JVT_ CSR and execution of a _cm.jalt_ or _cm.jt_ instruction by a lower privilege level results in an Illegal Instruction trap (or, if appropriate, a Virtual Instruction trap). + +include::Zcmt_footer.adoc[] + diff --git a/src/zc/pushpop.adoc b/src/zc/pushpop.adoc new file mode 100644 index 0000000..e4d61b8 --- /dev/null +++ b/src/zc/pushpop.adoc @@ -0,0 +1,349 @@ +<<< + +[#insns-pushpop,reftext="PUSH/POP Register Instructions"] +== PUSH/POP register instructions + +These instructions are collectively referred to as PUSH/POP: + +* <<#insns-cm_push>> +* <<#insns-cm_pop>> +* <<#insns-cm_popret>> +* <<#insns-cm_popretz>> + +The term PUSH refers to _cm.push_. + +The term POP refers to _cm.pop_. + +The term POPRET refers to _cm.popret and cm.popretz_. + +Common details for these instructions are in this section. + +=== PUSH/POP functional overview + +PUSH, POP, POPRET are used to reduce the size of function prologues and epilogues. + +. The PUSH instruction +** adjusts the stack pointer to create the stack frame +** pushes (stores) the registers specified in the register list to the stack frame + +. The POP instruction +** pops (loads) the registers in the register list from the stack frame +** adjusts the stack pointer to destroy the stack frame + +. The POPRET instructions +** pop (load) the registers in the register list from the stack frame +** _cm.popretz_ also moves zero into _a0_ as the return value +** adjust the stack pointer to destroy the stack frame +** execute a _ret_ instruction to return from the function + +<<< +=== Example usage + +This example gives an illustration of the use of PUSH and POPRET. + +The function _processMarkers_ in the EMBench benchmark picojpeg in the following file on github: https://github.com/embench/embench-iot/blob/master/src/picojpeg/libpicojpeg.c[libpicojpeg.c] + +The prologue and epilogue compile with GCC10 to: + +[source,SAIL] +---- + + 0001098a : + 1098a: 711d addi sp,sp,-96 ;#cm.push(1) + 1098c: c8ca sw s2,80(sp) ;#cm.push(2) + 1098e: c6ce sw s3,76(sp) ;#cm.push(3) + 10990: c4d2 sw s4,72(sp) ;#cm.push(4) + 10992: ce86 sw ra,92(sp) ;#cm.push(5) + 10994: cca2 sw s0,88(sp) ;#cm.push(6) + 10996: caa6 sw s1,84(sp) ;#cm.push(7) + 10998: c2d6 sw s5,68(sp) ;#cm.push(8) + 1099a: c0da sw s6,64(sp) ;#cm.push(9) + 1099c: de5e sw s7,60(sp) ;#cm.push(10) + 1099e: dc62 sw s8,56(sp) ;#cm.push(11) + 109a0: da66 sw s9,52(sp) ;#cm.push(12) + 109a2: d86a sw s10,48(sp);#cm.push(13) + 109a4: d66e sw s11,44(sp);#cm.push(14) +... + 109f4: 4501 li a0,0 ;#cm.popretz(1) + 109f6: 40f6 lw ra,92(sp) ;#cm.popretz(2) + 109f8: 4466 lw s0,88(sp) ;#cm.popretz(3) + 109fa: 44d6 lw s1,84(sp) ;#cm.popretz(4) + 109fc: 4946 lw s2,80(sp) ;#cm.popretz(5) + 109fe: 49b6 lw s3,76(sp) ;#cm.popretz(6) + 10a00: 4a26 lw s4,72(sp) ;#cm.popretz(7) + 10a02: 4a96 lw s5,68(sp) ;#cm.popretz(8) + 10a04: 4b06 lw s6,64(sp) ;#cm.popretz(9) + 10a06: 5bf2 lw s7,60(sp) ;#cm.popretz(10) + 10a08: 5c62 lw s8,56(sp) ;#cm.popretz(11) + 10a0a: 5cd2 lw s9,52(sp) ;#cm.popretz(12) + 10a0c: 5d42 lw s10,48(sp);#cm.popretz(13) + 10a0e: 5db2 lw s11,44(sp);#cm.popretz(14) + 10a10: 6125 addi sp,sp,96 ;#cm.popretz(15) + 10a12: 8082 ret ;#cm.popretz(16) +---- + +<<< + +with the GCC option _-msave-restore_ the output is the following: + +[source,SAIL] +---- +0001080e : + 1080e: 73a012ef jal t0,11f48 <__riscv_save_12> + 10812: 1101 addi sp,sp,-32 +... + 10862: 4501 li a0,0 + 10864: 6105 addi sp,sp,32 + 10866: 71e0106f j 11f84 <__riscv_restore_12> +---- + +with PUSH/POPRET this reduces to + +[source,SAIL] +---- +0001080e : + 1080e: b8fa cm.push {ra,s0-s11},-96 +... + 10866: bcfa cm.popretz {ra,s0-s11}, 96 +---- + +The prologue / epilogue reduce from 60-bytes in the original code, to 14-bytes with _-msave-restore_, +and to 4-bytes with PUSH and POPRET. +As well as reducing the code-size PUSH and POPRET eliminate the branches from +calling the millicode _save/restore_ routines and so may also perform better. + +[NOTE] + + The calls to _/_ become 64-bit when the target functions are out of the ±1MB range, increasing the prologue/epilogue size to 22-bytes. + +[NOTE] + + POP is typically used in tail-calling sequences where _ret_ is not used to return to _ra_ after destroying the stack frame. + +[#pushpop-areg-list] + +==== Stack pointer adjustment handling + +The instructions all automatically adjust the stack pointer by enough to cover the memory required for the registers being saved or restored. +Additionally the _spimm_ field in the encoding allows the stack pointer to be adjusted in additional increments of 16-bytes. There is only a small restricted +range available in the encoding; if the range is insufficient then a separate _c.addi16sp_ can be used to increase the range. + +==== Register list handling + +There is no support for the _{ra, s0-s10}_ register list without also adding _s11_. Therefore the _{ra, s0-s11}_ register list must be used in this case. + +[#pushpop-idempotent-memory] +=== PUSH/POP Fault handling + +Correct execution requires that _sp_ refers to idempotent memory (also see <>), because the core must be able to +handle traps detected during the sequence. +The entire PUSH/POP sequence is re-executed after returning from the trap handler, and multiple traps are possible during the sequence. + +If a trap occurs during the sequence then _xEPC_ is updated with the PC of the instruction, _xTVAL_ (if not read-only-zero) updated with the bad address if it was an access fault and _xCAUSE_ updated with the type of trap. + +NOTE: It is implementation defined whether interrupts can also be taken during the sequence execution. + +[#pushpop-software-view] +=== Software view of execution + +==== Software view of the PUSH sequence + +From a software perspective the PUSH sequence appears as: + +* A sequence of stores writing the bytes required by the pseudo-code +** The bytes may be written in any order. +** The bytes may be grouped into larger accesses. +** Any of the bytes may be written multiple times. +* A stack pointer adjustment + +NOTE: If an implementation allows interrupts during the sequence, and the interrupt handler uses _sp_ to allocate stack memory, then any stores which were executed before the interrupt may be overwritten by the handler. This is safe because the memory is idempotent and the stores will be re-executed when execution resumes. + +The stack pointer adjustment must only be committed only when it is certain that the entire PUSH instruction will commit. + +Stores may also return imprecise faults from the bus. +It is platform defined whether the core implementation waits for the bus responses before continuing to the final stage of the sequence, +or handles errors responses after completing the PUSH instruction. + +<<< + +For example: + +[source,sail] +-- +cm.push {ra, s0-s5}, -64 +-- + +Appears to software as: + +[source,sail] +-- +# any bytes from sp-1 to sp-28 may be written multiple times before +# the instruction completes therefore these updates may be visible in +# the interrupt/exception handler below the stack pointer +sw s5, -4(sp) +sw s4, -8(sp) +sw s3,-12(sp) +sw s2,-16(sp) +sw s1,-20(sp) +sw s0,-24(sp) +sw ra,-28(sp) + +# this must only execute once, and will only execute after all stores +# completed without any precise faults, therefore this update is only +# visible in the interrupt/exception handler if cm.push has completed +addi sp, sp, -64 +-- + +==== Software view of the POP/POPRET sequence + +From a software perspective the POP/POPRET sequence appears as: + +* A sequence of loads reading the bytes required by the pseudo-code. +** The bytes may be loaded in any order. +** The bytes may be grouped into larger accesses. +** Any of the bytes may be loaded multiple times. +* A stack pointer adjustment +* An optional `li a0, 0` +* An optional `ret` + +If a trap occurs during the sequence, then any loads which were executed before the trap may update architectural state. +The loads will be re-executed once the trap handler completes, so the values will be overwritten. +Therefore it is permitted for an implementation to update some of the destination registers before taking a fault. + +The optional `li a0, 0`, stack pointer adjustment and optional `ret` must only be committed only when it is certain that the entire POP/POPRET instruction will commit. + +For POPRET once the stack pointer adjustment has been committed the `ret` must execute. + +<<< +For example: + +[source,sail] +-- +cm.popretz {ra, s0-s3}, 32; +-- + +Appears to software as: + +[source,sail] +-- +# any or all of these load instructions may execute multiple times +# therefore these updates may be visible in the interrupt/exception handler +lw s3, 28(sp) +lw s2, 24(sp) +lw s1, 20(sp) +lw s0, 16(sp) +lw ra, 12(sp) + +# these must only execute once, will only execute after all loads +# complete successfully all instructions must execute atomically +# therefore these updates are not visible in the interrupt/exception handler +li a0, 0 +addi sp, sp, 32 +ret +-- + +[[pushpop_non-idem-mem]] +=== Non-idempotent memory handling + +An implementation may have a requirement to issue a PUSH/POP instruction to non-idempotent memory. + +If the core implementation does not support PUSH/POP to non-idempotent memories, the core may use an idempotency PMA to detect it and take a +load (POP/POPRET) or store (PUSH) access fault exception in order to avoid unpredictable results. + +Software should only use these instructions on non-idempotent memory regions when software can tolerate the required memory accesses +being issued repeatedly in the case that they cause exceptions. + +<<< + +=== Example RV32I PUSH/POP sequences + +The examples are included show the load/store series expansion and the stack adjustment. +Examples of _cm.popret_ and _cm.popretz_ are not included, as the difference in the expanded sequence from _cm.pop_ is trivial in all cases. + +==== cm.push {ra, s0-s2}, -64 + +Encoding: _rlist_=7, _spimm_=3 + +expands to: + +[source,sail] +-- +sw s2, -4(sp); +sw s1, -8(sp); +sw s0, -12(sp); +sw ra, -16(sp); +addi sp, sp, -64; +-- + +==== cm.push {ra, s0-s11}, -112 + +Encoding: _rlist_=15, _spimm_=3 + +expands to: + +[source,sail] +-- +sw s11, -4(sp); +sw s10, -8(sp); +sw s9, -12(sp); +sw s8, -16(sp); +sw s7, -20(sp); +sw s6, -24(sp); +sw s5, -28(sp); +sw s4, -32(sp); +sw s3, -36(sp); +sw s2, -40(sp); +sw s1, -44(sp); +sw s0, -48(sp); +sw ra, -52(sp); +addi sp, sp, -112; +-- + +<<< + +==== cm.pop {ra}, 16 + +Encoding: _rlist_=4, _spimm_=0 + +expands to: + +[source,sail] +-- +lw ra, 12(sp); +addi sp, sp, 16; +-- + +==== cm.pop {ra, s0-s3}, 48 + +Encoding: _rlist_=8, _spimm_=1 + +expands to: + +[source,sail] +-- +lw s3, 44(sp); +lw s2, 40(sp); +lw s1, 36(sp); +lw s0, 32(sp); +lw ra, 28(sp); +addi sp, sp, 48; +-- + +==== cm.pop {ra, s0-s4}, 64 + +Encoding: _rlist_=9, _spimm_=2 + +expands to: + +[source,sail] +-- +lw s4, 60(sp); +lw s3, 56(sp); +lw s2, 52(sp); +lw s1, 48(sp); +lw s0, 44(sp); +lw ra, 40(sp); +addi sp, sp, 64; +-- + +include::Zcmp_footer.adoc[] diff --git a/src/zc/pushpop_extra_info.adoc b/src/zc/pushpop_extra_info.adoc new file mode 100644 index 0000000..52bf69c --- /dev/null +++ b/src/zc/pushpop_extra_info.adoc @@ -0,0 +1,22 @@ + +[NOTE] + + All ABI register mappings are for the UABI. An EABI version is planned once the EABI is frozen. + +For further information see <>. + +Stack Adjustment Calculation:: + +_stack_adj_base_ is the minimum number of bytes, in multiples of 16-byte address increments, required to cover the registers in the list. + +_spimm_ is the number of additional 16-byte address increments allocated for the stack frame. + +The total stack adjustment represents the total size of the stack frame, which is _stack_adj_base_ added to _spimm_ scaled by 16, +as defined above. + +Prerequisites:: +None + +32-bit equivalent:: +No direct equivalent encoding exists + diff --git a/src/zc/pushpop_vars.adoc b/src/zc/pushpop_vars.adoc new file mode 100644 index 0000000..ce25524 --- /dev/null +++ b/src/zc/pushpop_vars.adoc @@ -0,0 +1,91 @@ + +[source,sail] +-- +RV32E: + +switch (rlist){ + case 4: {reg_list="ra"; xreg_list="x1";} + case 5: {reg_list="ra, s0"; xreg_list="x1, x8";} + case 6: {reg_list="ra, s0-s1"; xreg_list="x1, x8-x9";} + default: reserved(); +} +stack_adj = stack_adj_base + spimm[5:4] * 16; +-- + +[source,sail] +-- +RV32I, RV64: + +switch (rlist){ + case 4: {reg_list="ra"; xreg_list="x1";} + case 5: {reg_list="ra, s0"; xreg_list="x1, x8";} + case 6: {reg_list="ra, s0-s1"; xreg_list="x1, x8-x9";} + case 7: {reg_list="ra, s0-s2"; xreg_list="x1, x8-x9, x18";} + case 8: {reg_list="ra, s0-s3"; xreg_list="x1, x8-x9, x18-x19";} + case 9: {reg_list="ra, s0-s4"; xreg_list="x1, x8-x9, x18-x20";} + case 10: {reg_list="ra, s0-s5"; xreg_list="x1, x8-x9, x18-x21";} + case 11: {reg_list="ra, s0-s6"; xreg_list="x1, x8-x9, x18-x22";} + case 12: {reg_list="ra, s0-s7"; xreg_list="x1, x8-x9, x18-x23";} + case 13: {reg_list="ra, s0-s8"; xreg_list="x1, x8-x9, x18-x24";} + case 14: {reg_list="ra, s0-s9"; xreg_list="x1, x8-x9, x18-x25";} + //note - to include s10, s11 must also be included + case 15: {reg_list="ra, s0-s11"; xreg_list="x1, x8-x9, x18-x27";} + default: reserved(); +} +stack_adj = stack_adj_base + spimm[5:4] * 16; +-- + +[source,sail] +-- +RV32E: + +stack_adj_base = 16; +Valid values: +stack_adj = [16|32|48|64]; +-- + +[source,sail] +-- +RV32I: + +switch (rlist) { + case 4.. 7: stack_adj_base = 16; + case 8..11: stack_adj_base = 32; + case 12..14: stack_adj_base = 48; + case 15: stack_adj_base = 64; +} + +Valid values: +switch (rlist) { + case 4.. 7: stack_adj = [16|32|48| 64]; + case 8..11: stack_adj = [32|48|64| 80]; + case 12..14: stack_adj = [48|64|80| 96]; + case 15: stack_adj = [64|80|96|112]; +} +-- + +[source,sail] +-- +RV64: + +switch (rlist) { + case 4.. 5: stack_adj_base = 16; + case 6.. 7: stack_adj_base = 32; + case 8.. 9: stack_adj_base = 48; + case 10..11: stack_adj_base = 64; + case 12..13: stack_adj_base = 80; + case 14: stack_adj_base = 96; + case 15: stack_adj_base = 112; +} + +Valid values: +switch (rlist) { + case 4.. 5: stack_adj = [ 16| 32| 48| 64]; + case 6.. 7: stack_adj = [ 32| 48| 64| 80]; + case 8.. 9: stack_adj = [ 48| 64| 80| 96]; + case 10..11: stack_adj = [ 64| 80| 96|112]; + case 12..13: stack_adj = [ 80| 96|112|128]; + case 14: stack_adj = [ 96|112|128|144]; + case 15: stack_adj = [112|128|144|160]; +} +-- diff --git a/src/zc/readme.md b/src/zc/readme.md new file mode 100644 index 0000000..8a333e7 --- /dev/null +++ b/src/zc/readme.md @@ -0,0 +1,15 @@ +This directory has the latest draft specification for the Zc extensions, without the PDF build. + +To see the latest built version go to: + +https://github.com/riscv/riscv-code-size-reduction/tags + +The benchmarking results for all Zc extensions are here: + +https://docs.google.com/spreadsheets/d/1bFMyGkuuulBXuIaMsjBINoCWoLwObr1l9h5TAWN8s7k/edit#gid=21966619 + +There are many changes since v0.50.1, which has been used for toolchain, spike, qemu and the CV32E41P implementation. + +This shows how the specification has changed from v0.50.1 to the current version: + +https://github.com/riscv/riscv-code-size-reduction/blob/master/Zc-specification/changes_since_v0.50.adoc diff --git a/src/zc/tablejump.adoc b/src/zc/tablejump.adoc new file mode 100644 index 0000000..fefa8fc --- /dev/null +++ b/src/zc/tablejump.adoc @@ -0,0 +1,49 @@ +<<< + +[#insns-tablejump,reftext="Table Jump Overview"] +== Table Jump Overview + +_cm.jt_ (<<#insns-cm_jt>>) and _cm.jalt_ (<<#insns-cm_jalt>>) are referred to as table jump. + +Table jump uses a 256-entry XLEN wide table in instruction memory to contain function addresses. +The table must be a minimum of 64-byte aligned. + +Table entries follow the current data endianness. This is different from normal instruction fetch which is always little-endian. + +_cm.jt_ and _cm.jalt_ encodings index the table, giving access to functions within the full XLEN wide address space. + +This is used as a form of dictionary compression to reduce the code size of _jal_ / _auipc+jalr_ / _jr_ / _auipc+jr_ instructions. + +Table jump allows the linker to replace the following instruction sequences with a _cm.jt_ or _cm.jalt_ encoding, and an entry in the table: + +* 32-bit _j_ calls +* 32-bit _jal_ ra calls +* 64-bit _auipc+jr_ calls to fixed locations +* 64-bit _auipc+jalr ra_ calls to fixed locations +** The _auipc+jr/jalr_ sequence is used because the offset from the PC is out of the ±1MB range. + +If a return address stack is implemented, then as _cm.jalt_ is equivalent to _jal ra_, it pushes to the stack. + +=== JVT + +The base of the table is in the JVT CSR (see <>), each table entry is XLEN bits. + +If the same function is called with and without linking then it must have two entries in the table. +This is typically caused by the same function being called with and without tail calling. + +[#tablejump-fault-handling] +=== Table Jump Fault handling + +For a table jump instruction, the table entry that the instruction selects is considered an extension of the instruction itself. +Hence, the execution of a table jump instruction involves two instruction fetches, the first to read the instruction (_cm.jt_/_cm.jalt_) +and the second to read from the jump vector table (JVT). Both instruction fetches are _implicit_ reads, and both require +execute permission; read permission is irrelevant. It is recommended that the second fetch be ignored for hardware triggers and breakpoints. + +Memory writes to the jump vector table require an instruction barrier (_fence.i_) to guarantee that they are visible to the instruction fetch. + +Multiple contexts may have different jump vector tables. JVT may be switched between them without an instruction barrier +if the tables have not been updated in memory since the last _fence.i_. + +If an exception occurs on either instruction fetch, xEPC is set to the PC of the table jump instruction, xCAUSE is set as expected for the type of fault and xTVAL (if not set to zero) contains the fetch address which caused the fault. + +include::Zcmt_footer.adoc[] diff --git a/src/zc/variable_def.adoc b/src/zc/variable_def.adoc new file mode 100644 index 0000000..a660cac --- /dev/null +++ b/src/zc/variable_def.adoc @@ -0,0 +1 @@ +The variables used in the assembly syntax are defined below. -- cgit v1.1 From 2f48395ba9fd2fabb3d1173d6d76574f4f7f85f7 Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Wed, 31 Jan 2024 14:55:42 -0500 Subject: Moved Zc to after tso Moved Zc chapter to land after tso chapter. Started cleaning up asciidoc. --- src/riscv-unprivileged.adoc | 4 +++- src/zc/Zc.adoc | 7 +++++-- 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/src/riscv-unprivileged.adoc b/src/riscv-unprivileged.adoc index 4a5bab8..9a0cf06 100644 --- a/src/riscv-unprivileged.adoc +++ b/src/riscv-unprivileged.adoc @@ -126,8 +126,10 @@ include::zfa.adoc[] //zfa.tex include::ztso-st-ext.adoc[] //ztso.tex -include::rv-32-64g.adoc[] + include::zc/Zc.adoc[] + +include::rv-32-64g.adoc[] //gmaps.tex include::extending.adoc[] //extensions.tex diff --git a/src/zc/Zc.adoc b/src/zc/Zc.adoc index 137824f..31c72c2 100644 --- a/src/zc/Zc.adoc +++ b/src/zc/Zc.adoc @@ -133,14 +133,17 @@ MISA.C is set if the following extensions are selected: * Zca, Zcd if D is specified (RV64 only) ** this configuration excludes Zcmp, Zcmt -[#Zca] +[#Zca,Zca] === Zca The Zca extension is added as way to refer to instructions in the C extension that do not include the floating-point loads and stores. Therefore it _excluded_ all 16-bit floating point loads and stores: _c.flw_, _c.flwsp_, _c.fsw_, _c.fswsp_, _c.fld_, _c.fldsp_, _c.fsd_, _c.fsdsp_. -NOTE: the C extension only includes F/D instructions when D and F are also specified +[NOTE] +==== +the C extension only includes F/D instructions when D and F are also specified +==== [#Zcf] === Zcf (RV32 only) -- cgit v1.1 From a6202d693be24b175ca8a008625788a37fc2dd90 Mon Sep 17 00:00:00 2001 From: wmat Date: Mon, 5 Feb 2024 16:45:50 -0500 Subject: Lowering the heading level by one. Added another equal sign to the heading level to set it correctly. --- src/zc/Zc.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/zc/Zc.adoc b/src/zc/Zc.adoc index 31c72c2..ee71a73 100644 --- a/src/zc/Zc.adoc +++ b/src/zc/Zc.adoc @@ -365,7 +365,7 @@ Several instructions in this specification use the following new instruction for NOTE: c.mul uses the existing CA format [#Zcb_instructions] -== Zcb instructions +=== Zcb instructions include::c_lbu.adoc[] include::c_lhu.adoc[] -- cgit v1.1 From ce71c0593505ef7399ce6047a619aaa2ab30f81c Mon Sep 17 00:00:00 2001 From: wmat Date: Mon, 5 Feb 2024 16:48:16 -0500 Subject: Lowering heading level by one. Added another equal sign to the header to make it at the correct level. --- src/zc/c_lbu.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/zc/c_lbu.adoc b/src/zc/c_lbu.adoc index 3928373..1598d08 100644 --- a/src/zc/c_lbu.adoc +++ b/src/zc/c_lbu.adoc @@ -1,6 +1,6 @@ <<< [#insns-c_lbu,reftext="Load unsigned byte, 16-bit encoding"] -=== c.lbu +==== c.lbu Synopsis:: Load unsigned byte, 16-bit encoding -- cgit v1.1 From a7bfa2ef9cd1e9e28f725f173d699675f8c90b52 Mon Sep 17 00:00:00 2001 From: wmat Date: Tue, 6 Feb 2024 11:01:41 -0500 Subject: Adjust heading levels and fix notes. Adjusted heading levels to fit into overall document. Changed Notes to the standard within the overall document. --- src/zc/c_lbu.adoc | 2 ++ src/zc/c_lh.adoc | 4 +++- src/zc/c_lhu.adoc | 4 +++- src/zc/c_mul.adoc | 4 +++- src/zc/c_not.adoc | 4 +++- src/zc/c_sb.adoc | 4 +++- src/zc/c_sext_b.adoc | 4 +++- src/zc/c_sext_h.adoc | 4 +++- src/zc/c_sh.adoc | 4 +++- src/zc/c_zext_b.adoc | 7 +++++-- src/zc/c_zext_h.adoc | 4 +++- src/zc/c_zext_w.adoc | 4 +++- src/zc/changes_since_v0.50.adoc | 16 +++++++-------- src/zc/cm_decbnez.adoc | 5 +++-- src/zc/cm_jalt.adoc | 6 +++--- src/zc/cm_jt.adoc | 6 +++--- src/zc/cm_lb.adoc | 4 +++- src/zc/cm_lbu.adoc | 4 +++- src/zc/cm_lh.adoc | 4 +++- src/zc/cm_lhu.adoc | 6 +++++- src/zc/cm_mva01s.adoc | 5 +++-- src/zc/cm_mvsa01.adoc | 7 +++++-- src/zc/cm_pop.adoc | 6 +++--- src/zc/cm_popret.adoc | 5 +++-- src/zc/cm_popretz.adoc | 6 +++--- src/zc/cm_push.adoc | 5 +++-- src/zc/cm_sb.adoc | 6 +++++- src/zc/cm_sh.adoc | 6 +++++- src/zc/jvt_csr.adoc | 7 +++++-- src/zc/pushpop.adoc | 43 +++++++++++++++++++++++------------------ src/zc/pushpop_extra_info.adoc | 3 ++- src/zc/tablejump.adoc | 6 +++--- 32 files changed, 132 insertions(+), 73 deletions(-) diff --git a/src/zc/c_lbu.adoc b/src/zc/c_lbu.adoc index 1598d08..06ad04b 100644 --- a/src/zc/c_lbu.adoc +++ b/src/zc/c_lbu.adoc @@ -27,7 +27,9 @@ Description:: This instruction loads a byte from the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. The resulting byte is zero extended to XLEN bits and is written to _rd'_. [NOTE] +==== _rd'_ and _rs1'_ are from the standard 8-register set x8-x15. +==== Prerequisites:: None diff --git a/src/zc/c_lh.adoc b/src/zc/c_lh.adoc index e519754..e89705a 100644 --- a/src/zc/c_lh.adoc +++ b/src/zc/c_lh.adoc @@ -1,6 +1,6 @@ <<< [#insns-c_lh,reftext="Load signed halfword, 16-bit encoding"] -=== c.lh +==== c.lh Synopsis:: Load signed halfword, 16-bit encoding @@ -28,7 +28,9 @@ Description:: This instruction loads a halfword from the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. The resulting halfword is sign extended to XLEN bits and is written to _rd'_. [NOTE] +==== _rd'_ and _rs1'_ are from the standard 8-register set x8-x15. +==== Prerequisites:: None diff --git a/src/zc/c_lhu.adoc b/src/zc/c_lhu.adoc index 6db5211..e6193fc 100644 --- a/src/zc/c_lhu.adoc +++ b/src/zc/c_lhu.adoc @@ -1,6 +1,6 @@ <<< [#insns-c_lhu,reftext="Load unsigned halfword, 16-bit encoding"] -=== c.lhu +==== c.lhu Synopsis:: Load unsigned halfword, 16-bit encoding @@ -28,7 +28,9 @@ Description:: This instruction loads a halfword from the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. The resulting halfword is zero extended to XLEN bits and is written to _rd'_. [NOTE] +==== _rd'_ and _rs1'_ are from the standard 8-register set x8-x15. +==== Prerequisites:: None diff --git a/src/zc/c_mul.adoc b/src/zc/c_mul.adoc index d2f5a21..5ab6aeb 100644 --- a/src/zc/c_mul.adoc +++ b/src/zc/c_mul.adoc @@ -1,6 +1,6 @@ <<< [#insns-c_mul,reftext="Multiply, 16-bit encoding"] -=== c.mul +==== c.mul Synopsis:: Multiply, 16-bit encoding @@ -25,7 +25,9 @@ Description:: This instruction multiplies XLEN bits of the source operands from _rsd'_ and _rs2'_ and writes the lowest XLEN bits of the result to _rsd'_. [NOTE] +==== _rd'/rs1'_ and _rs2'_ are from the standard 8-register set x8-x15. +==== Prerequisites:: M or Zmmul must be configured. diff --git a/src/zc/c_not.adoc b/src/zc/c_not.adoc index 4207ba0..9a0bbd9 100644 --- a/src/zc/c_not.adoc +++ b/src/zc/c_not.adoc @@ -1,6 +1,6 @@ <<< [#insns-c_not,reftext="Bitwise not, 16-bit encoding"] -=== c.not +==== c.not Synopsis:: Bitwise not, 16-bit encoding @@ -25,7 +25,9 @@ Description:: This instruction takes the one's complement of _rd'/rs1'_ and writes the result to the same register. [NOTE] +==== _rd'/rs1'_ is from the standard 8-register set x8-x15. +==== Prerequisites:: None diff --git a/src/zc/c_sb.adoc b/src/zc/c_sb.adoc index d0b1ac6..395d27a 100644 --- a/src/zc/c_sb.adoc +++ b/src/zc/c_sb.adoc @@ -1,6 +1,6 @@ <<< [#insns-c_sb,reftext="Store byte, 16-bit encoding"] -=== c.sb +==== c.sb Synopsis:: Store byte, 16-bit encoding @@ -27,7 +27,9 @@ Description:: This instruction stores the least significant byte of _rs2'_ to the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. [NOTE] +==== _rs1'_ and _rs2'_ are from the standard 8-register set x8-x15. +==== Prerequisites:: None diff --git a/src/zc/c_sext_b.adoc b/src/zc/c_sext_b.adoc index bcf8f15..2be52d0 100644 --- a/src/zc/c_sext_b.adoc +++ b/src/zc/c_sext_b.adoc @@ -1,6 +1,6 @@ <<< [#insns-c_sext_b,reftext="Sign extend byte, 16-bit encoding"] -=== c.sext.b +==== c.sext.b Synopsis:: Sign extend byte, 16-bit encoding @@ -27,7 +27,9 @@ It sign-extends the least-significant byte in the operand to XLEN bits by copyin in the byte (i.e., bit 7) to all of the more-significant bits. [NOTE] +==== _rd'/rs1'_ is from the standard 8-register set x8-x15. +==== Prerequisites:: Zbb is also required. diff --git a/src/zc/c_sext_h.adoc b/src/zc/c_sext_h.adoc index 82a64db..28a8ebe 100644 --- a/src/zc/c_sext_h.adoc +++ b/src/zc/c_sext_h.adoc @@ -1,6 +1,6 @@ <<< [#insns-c_sext_h,reftext="Sign extend halfword, 16-bit encoding"] -=== c.sext.h +==== c.sext.h Synopsis:: Sign extend halfword, 16-bit encoding @@ -27,7 +27,9 @@ It sign-extends the least-significant halfword in the operand to XLEN bits by co in the halfword (i.e., bit 15) to all of the more-significant bits. [NOTE] +==== _rd'/rs1'_ is from the standard 8-register set x8-x15. +==== Prerequisites:: Zbb is also required. diff --git a/src/zc/c_sh.adoc b/src/zc/c_sh.adoc index 977a887..992bed3 100644 --- a/src/zc/c_sh.adoc +++ b/src/zc/c_sh.adoc @@ -1,6 +1,6 @@ <<< [#insns-c_sh,reftext="Store halfword, 16-bit encoding"] -=== c.sh +==== c.sh Synopsis:: Store halfword, 16-bit encoding @@ -28,7 +28,9 @@ Description:: This instruction stores the least significant halfword of _rs2'_ to the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. [NOTE] +==== _rs1'_ and _rs2'_ are from the standard 8-register set x8-x15. +==== Prerequisites:: None diff --git a/src/zc/c_zext_b.adoc b/src/zc/c_zext_b.adoc index 500461d..c13d39f 100644 --- a/src/zc/c_zext_b.adoc +++ b/src/zc/c_zext_b.adoc @@ -1,6 +1,6 @@ <<< [#insns-c_zext_b,reftext="Zero extend byte, 16-bit encoding"] -=== c.zext.b +==== c.zext.b Synopsis:: Zero extend byte, 16-bit encoding @@ -27,7 +27,9 @@ It zero-extends the least-significant byte of the operand to XLEN bits by insert the bits more significant than 7. [NOTE] +==== _rd'/rs1'_ is from the standard 8-register set x8-x15. +==== Prerequisites:: None @@ -39,8 +41,9 @@ andi rd'/rs1', rd'/rs1', 0xff -- [NOTE] - +==== The SAIL module variable for _rd'/rs1'_ is called _rsdc_. +==== Operation:: [source,sail] diff --git a/src/zc/c_zext_h.adoc b/src/zc/c_zext_h.adoc index 5999857..29a31a2 100644 --- a/src/zc/c_zext_h.adoc +++ b/src/zc/c_zext_h.adoc @@ -1,6 +1,6 @@ <<< [#insns-c_zext_h,reftext="Zero extend halfword, 16-bit encoding"] -=== c.zext.h +==== c.zext.h Synopsis:: Zero extend halfword, 16-bit encoding @@ -27,7 +27,9 @@ It zero-extends the least-significant halfword of the operand to XLEN bits by in the bits more significant than 15. [NOTE] +==== _rd'/rs1'_ is from the standard 8-register set x8-x15. +==== Prerequisites:: Zbb is also required. diff --git a/src/zc/c_zext_w.adoc b/src/zc/c_zext_w.adoc index 3540405..35684f9 100644 --- a/src/zc/c_zext_w.adoc +++ b/src/zc/c_zext_w.adoc @@ -1,6 +1,6 @@ <<< [#insns-c_zext_w,reftext="Zero extend word, 16-bit encoding"] -=== c.zext.w +==== c.zext.w Synopsis:: Zero extend word, 16-bit encoding @@ -27,7 +27,9 @@ It zero-extends the least-significant word of the operand to XLEN bits by insert the bits more significant than 31. [NOTE] +==== _rd'/rs1'_ is from the standard 8-register set x8-x15. +==== Prerequisites:: Zba is also required. diff --git a/src/zc/changes_since_v0.50.adoc b/src/zc/changes_since_v0.50.adoc index a4452b1..b40626c 100644 --- a/src/zc/changes_since_v0.50.adoc +++ b/src/zc/changes_since_v0.50.adoc @@ -3,7 +3,7 @@ There are many changes since v0.50.1, which has been used for toolchain, spike, The status of all of the instructions are in the tables. Note that _all_ subsets have been redefined. -= Load/store +=== Load/store .Load/store [options="header",width=100%] @@ -22,7 +22,7 @@ The status of all of the instructions are in the tables. Note that _all_ subsets | N/A | C.SH | N/A | N/A | CM.SH with shorter uimm |==================================================================================== -= Table jump +=== Table jump .Table Jump [options="header",width=100%] @@ -35,7 +35,7 @@ The status of all of the instructions are in the tables. Note that _all_ subsets See this [commit](https://github.com/riscv/riscv-code-size-reduction/commit/8ba5b0fdf05d6fd5af118ba5301910d049abd1a8#diff-8d03bd23cf9ec0eb75984f7c6d4181aa9548acb5898dc9159514e24398076836) for the change in the table jump exception model. -= Double move +=== Double move .Double move [options="header",width=100%] @@ -47,7 +47,7 @@ See this [commit](https://github.com/riscv/riscv-code-size-reduction/commit/8ba5 Note that the .E extension versions for the EABI will be specified in the future, and cannot yet be confirmed as the EABI is not frozen. -= Simple instructions +=== Simple instructions .Simple instructions [options="header",width=100%] @@ -62,7 +62,7 @@ Note that the .E extension versions for the EABI will be specified in the future | C.MUL | same | N | N | unchanged |==================================================================================== -= Push/pop +=== Push/pop All 32-bit forms are removed and all the 16-bit forms support 12 register lists (excluding {ra, s0-s10}): @@ -93,7 +93,7 @@ Note that the .E extension versions for the EABI will be specified in the future | C.POPRET | CM.POPRETZ | Y | Y | separate encoding for return zero |==================================================================================== -= Instructions in v0.50 but *not* in v0.70 +=== Instructions in v0.50 but *not* in v0.70 These instructions can be left in the compiler as experimental, enabled with the following switches: @@ -110,14 +110,14 @@ These instructions can be left in the compiler as experimental, enabled with the | -mzce-decbnez | DECBNEZ |============================================================================== -== 16-bit Instructions +==== 16-bit Instructions C.DECBNEZ - the encoding space for this has been used by all the CM.* instructions. Therefore this instruction must be disabled in the compiler - unless an encoding is proposed. C.NEG - this is not very useful and can be deleted. -== 32-bit Instructions +==== 32-bit Instructions MULI - This is in custom-0, so can be kept unchanged. Early benchmarking results suggest it's not much use, and the encoding is expensive so it's unlikely to ever be included in an extension. diff --git a/src/zc/cm_decbnez.adoc b/src/zc/cm_decbnez.adoc index 912b768..6dbbd77 100644 --- a/src/zc/cm_decbnez.adoc +++ b/src/zc/cm_decbnez.adoc @@ -1,6 +1,6 @@ <<< [#insns-cm_decbnez,reftext="Decrement and branch, 16-bit encoding"] -=== cm.decbnez: This is in the _development_ phase, for benchmarking and prototyping only +==== cm.decbnez: This is in the _development_ phase, for benchmarking and prototyping only Synopsis:: Decrement and branch, 16-bit encoding @@ -22,8 +22,9 @@ Encoding (RV32, RV64):: .... [NOTE] - +==== In the current proposal only t0 can be decremented, future versions may allow more registers +==== Description:: This instruction decrements _t0_, and increments the PC by the sign extended immediate if _t0_ is zero *after* the decrement. diff --git a/src/zc/cm_jalt.adoc b/src/zc/cm_jalt.adoc index 372d933..9a5c392 100644 --- a/src/zc/cm_jalt.adoc +++ b/src/zc/cm_jalt.adoc @@ -1,6 +1,6 @@ <<< [#insns-cm_jalt,reftext="Jump and link via table"] -=== cm.jalt +==== cm.jalt Synopsis:: jump via table with optional link @@ -20,9 +20,9 @@ Encoding (RV32, RV64):: .... [NOTE] - +==== For this encoding to decode as _cm.jalt_, _index>=32_, otherwise it decodes as _cm.jt_, see <>. - +==== [NOTE] If JVT.mode = 0 (Jump Table Mode) then _cm.jalt_ behaves as specified here. If JVT.mode is a reserved value, then _cm.jalt_ is also reserved. In the future other defined values of JVT.mode may change the behaviour of _cm.jalt_. diff --git a/src/zc/cm_jt.adoc b/src/zc/cm_jt.adoc index 8c7f67d..ba7e41c 100644 --- a/src/zc/cm_jt.adoc +++ b/src/zc/cm_jt.adoc @@ -1,6 +1,6 @@ <<< [#insns-cm_jt,reftext="Jump via table"] -=== cm.jt +==== cm.jt Synopsis:: jump via table @@ -20,9 +20,9 @@ Encoding (RV32, RV64):: .... [NOTE] - +==== For this encoding to decode as _cm.jt_, _index<32_, otherwise it decodes as _cm.jalt_, see <>. - +==== [NOTE] If JVT.mode = 0 (Jump Table Mode) then _cm.jt_ behaves as specified here. If JVT.mode is a reserved value, then _cm.jt_ is also reserved. In the future other defined values of JVT.mode may change the behaviour of _cm.jt_. diff --git a/src/zc/cm_lb.adoc b/src/zc/cm_lb.adoc index 525ba97..4aefffc 100644 --- a/src/zc/cm_lb.adoc +++ b/src/zc/cm_lb.adoc @@ -1,6 +1,6 @@ <<< [#insns-cm_lb,reftext="Load signed byte, 16-bit encoding"] -=== cm.lb +==== cm.lb Synopsis:: Load signed byte, 16-bit encoding @@ -28,7 +28,9 @@ Description:: This instruction loads a byte from the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. The resulting byte is sign extended to XLEN bits and is written to _rd'_. [NOTE] +==== _rd'_ and _rs1'_ are from the standard 8-register set x8-x15. +==== Prerequisites:: None diff --git a/src/zc/cm_lbu.adoc b/src/zc/cm_lbu.adoc index 7e9735f..601ce3f 100644 --- a/src/zc/cm_lbu.adoc +++ b/src/zc/cm_lbu.adoc @@ -1,6 +1,6 @@ <<< [#insns-cm_lbu,reftext="Load unsigned byte, 16-bit encoding"] -=== cm.lbu +==== cm.lbu Synopsis:: Load unsigned byte, 16-bit encoding @@ -31,7 +31,9 @@ Description:: This instruction loads a byte from the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. The resulting byte is zero extended to XLEN bits and is written to _rd'_. [NOTE] +==== _rd'_ and _rs1'_ are from the standard 8-register set x8-x15. +==== Prerequisites:: None diff --git a/src/zc/cm_lh.adoc b/src/zc/cm_lh.adoc index bb1b6b9..4a23050 100644 --- a/src/zc/cm_lh.adoc +++ b/src/zc/cm_lh.adoc @@ -1,6 +1,6 @@ <<< [#insns-cm_lh,reftext="Load signed halfword, 16-bit encoding"] -=== cm.lh +==== cm.lh Synopsis:: Load signed halfword, 16-bit encoding @@ -31,7 +31,9 @@ Description:: This instruction loads a halfword from the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. The resulting halfword is sign extended to XLEN bits and is written to _rd'_. [NOTE] +==== _rd'_ and _rs1'_ are from the standard 8-register set x8-x15. +==== Prerequisites:: None diff --git a/src/zc/cm_lhu.adoc b/src/zc/cm_lhu.adoc index 3a3c281..6818ef1 100644 --- a/src/zc/cm_lhu.adoc +++ b/src/zc/cm_lhu.adoc @@ -1,6 +1,6 @@ <<< [#insns-cm_lhu,reftext="Load unsigned halfword, 16-bit encoding"] -=== cm.lhu +==== cm.lhu Synopsis:: Load unsigned halfword, 16-bit encoding @@ -23,7 +23,9 @@ Encoding (RV32, RV64):: .... [NOTE] +==== If _uimm < 4_ the encoding is designated for custom use, as the functionality overlaps with <>. +==== include::cm_lhsh_imm_offset.adoc[] @@ -31,7 +33,9 @@ Description:: This instruction loads a halfword from the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. The resulting halfword is zero extended to XLEN bits and is written to _rd'_. [NOTE] +==== _rd'_ and _rs1'_ are from the standard 8-register set x8-x15. +==== Prerequisites:: None diff --git a/src/zc/cm_mva01s.adoc b/src/zc/cm_mva01s.adoc index 9d36688..5b6d009 100644 --- a/src/zc/cm_mva01s.adoc +++ b/src/zc/cm_mva01s.adoc @@ -1,6 +1,6 @@ <<< [#insns-cm_mva01s,reftext="Move two s0-s7 registers into a0-a1"] -=== cm.mva01s +==== cm.mva01s Synopsis:: Move two s0-s7 registers into a0-a1 @@ -36,8 +36,9 @@ The encoding uses _sreg_ number specifiers instead of _xreg_ number specifiers t The mapping between them is specified in the pseudo-code below. [NOTE] - +==== The _s_ register mapping is taken from the UABI, and may not match the currently unratified EABI. _cm.mva01s.e_ may be included in the future. +==== Prerequisites:: None diff --git a/src/zc/cm_mvsa01.adoc b/src/zc/cm_mvsa01.adoc index fd59c85..7c4f6e2 100644 --- a/src/zc/cm_mvsa01.adoc +++ b/src/zc/cm_mvsa01.adoc @@ -1,6 +1,6 @@ <<< [#insns-cm_mvsa01,reftext="Move a0-a1 into two different s0-s7 registers"] -=== cm.mvsa01 +==== cm.mvsa01 Synopsis:: Move a0-a1 into two registers of s0-s7 @@ -22,7 +22,9 @@ Encoding (RV32, RV64):: .... [NOTE] +==== For the encoding to be legal _r1s'_ != _r2s'_. +==== Assembly Syntax:: @@ -39,8 +41,9 @@ The encoding uses _sreg_ number specifiers instead of _xreg_ number specifiers t The mapping between them is specified in the pseudo-code below. [NOTE] - +==== The _s_ register mapping is taken from the UABI, and may not match the currently unratified EABI. _cm.mvsa01.e_ may be included in the future. +==== Prerequisites:: None diff --git a/src/zc/cm_pop.adoc b/src/zc/cm_pop.adoc index 30e097e..fb9c880 100644 --- a/src/zc/cm_pop.adoc +++ b/src/zc/cm_pop.adoc @@ -1,6 +1,6 @@ <<< [#insns-cm_pop,reftext="Pop registers, deallocate stack frame."] -=== cm.pop +==== cm.pop Synopsis:: Destroy stack frame: load ra and 0 to 12 saved registers from the stack frame, deallocate the stack frame. @@ -21,9 +21,9 @@ Encoding (RV32, RV64):: .... [NOTE] - +==== _rlist_ values 0 to 3 are reserved for a future EABI variant called _cm.pop.e_ - +==== Assembly Syntax:: [source,sail] diff --git a/src/zc/cm_popret.adoc b/src/zc/cm_popret.adoc index 1150203..7650e6a 100644 --- a/src/zc/cm_popret.adoc +++ b/src/zc/cm_popret.adoc @@ -1,6 +1,6 @@ <<< [#insns-cm_popret,reftext="Pop registers, deallocate stack frame, return."] -=== cm.popret +==== cm.popret Synopsis:: Destroy stack frame: load ra and 0 to 12 saved registers from the stack frame, deallocate the stack frame, return to ra. @@ -21,8 +21,9 @@ Encoding (RV32, RV64):: .... [NOTE] - +==== _rlist_ values 0 to 3 are reserved for a future EABI variant called _cm.popret.e_ +==== Assembly Syntax:: diff --git a/src/zc/cm_popretz.adoc b/src/zc/cm_popretz.adoc index 10ccf35..d7e3bb8 100644 --- a/src/zc/cm_popretz.adoc +++ b/src/zc/cm_popretz.adoc @@ -1,6 +1,6 @@ <<< [#insns-cm_popretz,reftext="Pop registers, deallocate stack frame, return zero."] -=== cm.popretz +==== cm.popretz Synopsis:: Destroy stack frame: load ra and 0 to 12 saved registers from the stack frame, deallocate the stack frame, move zero into a0, return to ra. @@ -21,9 +21,9 @@ Encoding (RV32, RV64):: .... [NOTE] - +==== _rlist_ values 0 to 3 are reserved for a future EABI variant called _cm.popretz.e_ - +==== Assembly Syntax:: diff --git a/src/zc/cm_push.adoc b/src/zc/cm_push.adoc index 77f1fde..13b93fe 100644 --- a/src/zc/cm_push.adoc +++ b/src/zc/cm_push.adoc @@ -1,6 +1,6 @@ <<< [#insns-cm_push,reftext="Create stack frame: push registers, allocate additional stack space."] -=== cm.push +==== cm.push Synopsis:: Create stack frame: store ra and 0 to 12 saved registers to the stack frame, optionally allocate additional stack space. @@ -21,8 +21,9 @@ Encoding (RV32, RV64):: .... [NOTE] - +==== _rlist_ values 0 to 3 are reserved for a future EABI variant called _cm.push.e_ +==== Assembly Syntax:: diff --git a/src/zc/cm_sb.adoc b/src/zc/cm_sb.adoc index 265d039..b3e45ba 100644 --- a/src/zc/cm_sb.adoc +++ b/src/zc/cm_sb.adoc @@ -1,6 +1,6 @@ <<< [#insns-cm_sb,reftext="Store byte, 16-bit encoding"] -=== cm.sb +==== cm.sb Synopsis:: Store byte, 16-bit encoding @@ -23,7 +23,9 @@ Encoding (RV32, RV64):: .... [NOTE] +==== If _uimm < 4_ the encoding is designated for custom use, as the functionality overlaps with <>. +==== include::cm_lbsb_imm_offset.adoc[] @@ -31,7 +33,9 @@ Description:: This instruction stores the least significant byte of _rs2'_ to the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. [NOTE] +==== _rs1'_ and _rs2'_ are from the standard 8-register set x8-x15. +==== Prerequisites:: None diff --git a/src/zc/cm_sh.adoc b/src/zc/cm_sh.adoc index fb5e538..2464114 100644 --- a/src/zc/cm_sh.adoc +++ b/src/zc/cm_sh.adoc @@ -1,6 +1,6 @@ <<< [#insns-cm_sh,reftext="Store halfword, 16-bit encoding"] -=== cm.sh +==== cm.sh Synopsis:: Store halfword, 16-bit encoding @@ -23,7 +23,9 @@ Encoding (RV32, RV64):: .... [NOTE] +==== If _uimm < 4_ the encoding is designated for custom use, as the functionality overlaps with <>. +==== include::cm_lhsh_imm_offset.adoc[] @@ -31,7 +33,9 @@ Description:: This instruction stores the least significant halfword of _rs2'_ to the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. [NOTE] +==== _rs1'_ and _rs2'_ are from the standard 8-register set x8-x15. +==== Prerequisites:: None diff --git a/src/zc/jvt_csr.adoc b/src/zc/jvt_csr.adoc index 9ad2367..5484db3 100644 --- a/src/zc/jvt_csr.adoc +++ b/src/zc/jvt_csr.adoc @@ -1,6 +1,6 @@ <<< [#csrs-jvt,reftext="JVT CSR, table jump base vector and control register"] -=== JVT CSR +==== JVT CSR Synopsis:: Table jump base vector and control register @@ -51,7 +51,10 @@ The memory pointed to by _JVT.base_ is treated as instruction memory for the pur _JVT.mode_ is a *WARL* field, so can only be programmed to modes which are implemented. Therefore the discovery mechanism is to attempt to program different modes and read back the values to see which are available. Jump table mode _must_ be implemented. -NOTE: in future the RISC-V Unified Discovery method will report the available modes. +[NOTE] +==== + in future the RISC-V Unified Discovery method will report the available modes. +==== Architectural State:: diff --git a/src/zc/pushpop.adoc b/src/zc/pushpop.adoc index e4d61b8..8c706cf 100644 --- a/src/zc/pushpop.adoc +++ b/src/zc/pushpop.adoc @@ -1,7 +1,7 @@ <<< [#insns-pushpop,reftext="PUSH/POP Register Instructions"] -== PUSH/POP register instructions +=== PUSH/POP register instructions These instructions are collectively referred to as PUSH/POP: @@ -18,7 +18,7 @@ The term POPRET refers to _cm.popret and cm.popretz_. Common details for these instructions are in this section. -=== PUSH/POP functional overview +==== PUSH/POP functional overview PUSH, POP, POPRET are used to reduce the size of function prologues and epilogues. @@ -37,7 +37,7 @@ PUSH, POP, POPRET are used to reduce the size of function prologues and epilogue ** execute a _ret_ instruction to return from the function <<< -=== Example usage +==== Example usage This example gives an illustration of the use of PUSH and POPRET. @@ -113,27 +113,29 @@ As well as reducing the code-size PUSH and POPRET eliminate the branches from calling the millicode _save/restore_ routines and so may also perform better. [NOTE] - +==== The calls to _/_ become 64-bit when the target functions are out of the ±1MB range, increasing the prologue/epilogue size to 22-bytes. +==== [NOTE] - +==== POP is typically used in tail-calling sequences where _ret_ is not used to return to _ra_ after destroying the stack frame. +==== [#pushpop-areg-list] -==== Stack pointer adjustment handling +===== Stack pointer adjustment handling The instructions all automatically adjust the stack pointer by enough to cover the memory required for the registers being saved or restored. Additionally the _spimm_ field in the encoding allows the stack pointer to be adjusted in additional increments of 16-bytes. There is only a small restricted range available in the encoding; if the range is insufficient then a separate _c.addi16sp_ can be used to increase the range. -==== Register list handling +===== Register list handling There is no support for the _{ra, s0-s10}_ register list without also adding _s11_. Therefore the _{ra, s0-s11}_ register list must be used in this case. [#pushpop-idempotent-memory] -=== PUSH/POP Fault handling +==== PUSH/POP Fault handling Correct execution requires that _sp_ refers to idempotent memory (also see <>), because the core must be able to handle traps detected during the sequence. @@ -144,9 +146,9 @@ If a trap occurs during the sequence then _xEPC_ is updated with the PC of the i NOTE: It is implementation defined whether interrupts can also be taken during the sequence execution. [#pushpop-software-view] -=== Software view of execution +==== Software view of execution -==== Software view of the PUSH sequence +===== Software view of the PUSH sequence From a software perspective the PUSH sequence appears as: @@ -156,7 +158,10 @@ From a software perspective the PUSH sequence appears as: ** Any of the bytes may be written multiple times. * A stack pointer adjustment -NOTE: If an implementation allows interrupts during the sequence, and the interrupt handler uses _sp_ to allocate stack memory, then any stores which were executed before the interrupt may be overwritten by the handler. This is safe because the memory is idempotent and the stores will be re-executed when execution resumes. +[NOTE] +==== + If an implementation allows interrupts during the sequence, and the interrupt handler uses _sp_ to allocate stack memory, then any stores which were executed before the interrupt may be overwritten by the handler. This is safe because the memory is idempotent and the stores will be re-executed when execution resumes. +==== The stack pointer adjustment must only be committed only when it is certain that the entire PUSH instruction will commit. @@ -194,7 +199,7 @@ sw ra,-28(sp) addi sp, sp, -64 -- -==== Software view of the POP/POPRET sequence +===== Software view of the POP/POPRET sequence From a software perspective the POP/POPRET sequence appears as: @@ -243,7 +248,7 @@ ret -- [[pushpop_non-idem-mem]] -=== Non-idempotent memory handling +==== Non-idempotent memory handling An implementation may have a requirement to issue a PUSH/POP instruction to non-idempotent memory. @@ -255,12 +260,12 @@ being issued repeatedly in the case that they cause exceptions. <<< -=== Example RV32I PUSH/POP sequences +==== Example RV32I PUSH/POP sequences The examples are included show the load/store series expansion and the stack adjustment. Examples of _cm.popret_ and _cm.popretz_ are not included, as the difference in the expanded sequence from _cm.pop_ is trivial in all cases. -==== cm.push {ra, s0-s2}, -64 +===== cm.push {ra, s0-s2}, -64 Encoding: _rlist_=7, _spimm_=3 @@ -275,7 +280,7 @@ sw ra, -16(sp); addi sp, sp, -64; -- -==== cm.push {ra, s0-s11}, -112 +===== cm.push {ra, s0-s11}, -112 Encoding: _rlist_=15, _spimm_=3 @@ -301,7 +306,7 @@ addi sp, sp, -112; <<< -==== cm.pop {ra}, 16 +===== cm.pop {ra}, 16 Encoding: _rlist_=4, _spimm_=0 @@ -313,7 +318,7 @@ lw ra, 12(sp); addi sp, sp, 16; -- -==== cm.pop {ra, s0-s3}, 48 +===== cm.pop {ra, s0-s3}, 48 Encoding: _rlist_=8, _spimm_=1 @@ -329,7 +334,7 @@ lw ra, 28(sp); addi sp, sp, 48; -- -==== cm.pop {ra, s0-s4}, 64 +===== cm.pop {ra, s0-s4}, 64 Encoding: _rlist_=9, _spimm_=2 diff --git a/src/zc/pushpop_extra_info.adoc b/src/zc/pushpop_extra_info.adoc index 52bf69c..342e36d 100644 --- a/src/zc/pushpop_extra_info.adoc +++ b/src/zc/pushpop_extra_info.adoc @@ -1,7 +1,8 @@ [NOTE] - +==== All ABI register mappings are for the UABI. An EABI version is planned once the EABI is frozen. +==== For further information see <>. diff --git a/src/zc/tablejump.adoc b/src/zc/tablejump.adoc index fefa8fc..e490087 100644 --- a/src/zc/tablejump.adoc +++ b/src/zc/tablejump.adoc @@ -1,7 +1,7 @@ <<< [#insns-tablejump,reftext="Table Jump Overview"] -== Table Jump Overview +=== Table Jump Overview _cm.jt_ (<<#insns-cm_jt>>) and _cm.jalt_ (<<#insns-cm_jalt>>) are referred to as table jump. @@ -24,7 +24,7 @@ Table jump allows the linker to replace the following instruction sequences with If a return address stack is implemented, then as _cm.jalt_ is equivalent to _jal ra_, it pushes to the stack. -=== JVT +==== JVT The base of the table is in the JVT CSR (see <>), each table entry is XLEN bits. @@ -32,7 +32,7 @@ If the same function is called with and without linking then it must have two en This is typically caused by the same function being called with and without tail calling. [#tablejump-fault-handling] -=== Table Jump Fault handling +==== Table Jump Fault handling For a table jump instruction, the table entry that the instruction selects is considered an extension of the instruction itself. Hence, the execution of a table jump instruction involves two instruction fetches, the first to read the instruction (_cm.jt_/_cm.jalt_) -- cgit v1.1 From 76629566e79564c1a2d071c27a65926b9846f28a Mon Sep 17 00:00:00 2001 From: wmat Date: Tue, 6 Feb 2024 11:39:58 -0500 Subject: Initial seeding of zawrs chapter. Adding a new zawrs.adoc chapter --- src/riscv-unprivileged.adoc | 1 + src/zawrs.adoc | 131 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 132 insertions(+) create mode 100644 src/zawrs.adoc diff --git a/src/riscv-unprivileged.adoc b/src/riscv-unprivileged.adoc index 91a7a5d..21db1b5 100644 --- a/src/riscv-unprivileged.adoc +++ b/src/riscv-unprivileged.adoc @@ -126,6 +126,7 @@ include::zfa.adoc[] //zfa.tex include::ztso-st-ext.adoc[] //ztso.tex +include::zawrs.adoc[] include::rv-32-64g.adoc[] //gmaps.tex include::extending.adoc[] diff --git a/src/zawrs.adoc b/src/zawrs.adoc new file mode 100644 index 0000000..e684c8b --- /dev/null +++ b/src/zawrs.adoc @@ -0,0 +1,131 @@ +== RISC-V Wait-on-Reservation-Set (Zawrs) extension + +// Preamble +[WARNING] +.This document is in the link:http://riscv.org/spec-state[Ratified] +==== +No changes are allowed. Any desired or needed changes can be the subject of a +follow-on new extension. Ratified extensions are never revised +==== + +=== Copyright and license information +This specification is licensed under the Creative Commons +Attribution 4.0 International License (CC-BY 4.0). The full +license text is available at +https://creativecommons.org/licenses/by/4.0/. + +Copyright 2022 by RISC-V International. + +=== Contributors + +This RISC-V specification has been contributed to directly or indirectly by: + +Aaron Durbin, Abel Bernabeu, Allen Baum, Christoph Müllner, David Weaver, Greg Favor, Josh Scheid, Ken Dockser, Paul Donahue, Phil McCoy, Philipp Tomsich, Tariq Kurd, Ved Shanbhogue + +=== Introduction +The Zawrs extension defines a pair of instructions to be used in polling loops +that allows a core to enter a low-power state and wait on a store to a memory +location. Waiting for a memory location to be updated is a common pattern in +many use cases such as: + +. Contenders for a lock waiting for the lock variable to be updated. + +. Consumers waiting on the tail of an empty queue for the producer to queue + work/data. The producer may be code executing on a RISC-V hart, an accelerator + device, an external I/O agent. + +. Code waiting on a flag to be set in memory indicative of an event occurring. + For example, software on a RISC-V hart may wait on a "done" flag to be set in + memory by an accelerator device indicating completion of a job previously + submitted to the device. + +Such use cases involve polling on memory locations, and such busy loops can be a +wasteful expenditure of energy. To mitigate the wasteful looping in such usages, +a `WRS.NTO` (WRS-with-no-timeout) instruction is provided. Instead of polling +for a store to a specific memory location, software registers a reservation set +that includes all the bytes of the memory location using the `LR` instruction. +Then a subsequent `WRS.NTO` instruction would cause the hart to temporarily +stall execution in a low-power state until a store occurs to the reservation set +or an interrupt is observed. + +Sometimes the program waiting on a memory update may also need to carry out a +task at a future time or otherwise place an upper bound on the wait. To support +such use cases a second instruction `WRS.STO` (WRS-with-short-timeout) is +provided that works like `WRS.NTO` but bounds the stall duration to an +implementation-define short timeout such that the stall is terminated on the +timeout if no other conditions have occurred to terminate the stall. The +program using this instruction may then determine if its deadline has been +reached. + +[NOTE] +==== +The instructions in the Zawrs extension are only useful in conjunction with the +LR instructions, which are provided by the A extension, and which we also expect +to be provided by a narrower Zalrsc extension in the future. +==== + +[[Zawrs]] +=== Zawrs + +The `WRS.NTO` and `WRS.STO` instructions cause the hart to temporarily stall +execution in a low-power state as long as the reservation set is valid and no +pending interrupts, even if disabled, are observed. For `WRS.STO` the stall +duration is bounded by an implementation defined short timeout. These +instructions are available in all privilege modes. These instructions are not +supported in a constrained `LR`/`SC` loop. + +*Encoding:* +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['SYSTEM(0x73)'] }, + {bits: 5, name: 'rd', attr: ['0'] }, + {bits: 3, name: 'funct3', attr: ['0'] }, + {bits: 5, name: 'rs1', attr: ['0'] }, + {bits: 12, name: 'funct12', attr:['WRS.NTO(0x0d)', 'WRS.STO(0x1d)'] }, +], config:{lanes: 1, hspace:1024}} +.... + +*Operation:* +[source,asciidoc, linenums] +.... +Hart execution may be stalled while the following conditions are all satisfied: + a) The reservation set is valid + b) If `WRS.STO`, a "short" duration since start of stall has not elapsed + c) No pending interrupt is observed (see the rules below) +.... + +While stalled, an implementation is permitted to occasionally terminate the +stall and complete execution for any reason. + +`WRS.NTO` and `WRS.STO` instructions follow the rules of the `WFI` instruction +for resuming execution on a pending interrupt. + +When the `TW` (Timeout Wait) bit in `mstatus` is set and `WRS.NTO` is executed +in any privilege mode other than M mode, and it does not complete within an +implementation-specific bounded time limit, the `WRS.NTO` instruction will cause +an illegal instruction exception. + +When executing in VS or VU mode, if the `VTW` bit is set in `hstatus`, the +`TW` bit in `mstatus` is clear, and the `WRS.NTO` does not complete within an +implementation-specific bounded time limit, the `WRS.NTO` instruction will cause +a virtual instruction exception. + +[NOTE] +==== +Since the `WRS.STO` and `WRS.NTO` instructions can complete execution for +reasons other than stores to the reservation set, software will likely need +a means of looping until the required stores have occurred. + +The duration of a `WRS.STO` instruction's timeout may vary significantly within +and among implementations. In typical implementations this duration should be +roughly in the range of 10 to 100 times an on-chip cache miss latency or a +cacheless access to main memory. + +`WRS.NTO`, unlike `WFI`, is not specified to cause an illegal instruction +exception if executed in U-mode when the governing `TW` bit is 0. `WFI` is +typically not expected to be used in U-mode and on many systems may promptly +cause an illegal instruction exception if used at U-mode. Unlike `WFI`, +`WRS.NTO` is expected to be used by software in U-mode when waiting on +memory but without a deadline for that wait. +==== \ No newline at end of file -- cgit v1.1 From 961ddc376e2735fb1916f513a16c0026edda30fd Mon Sep 17 00:00:00 2001 From: wmat Date: Tue, 6 Feb 2024 11:42:15 -0500 Subject: Removing ratified warning and copyright As the whole spec is ratified and copyright applies to the whole document, removing from this chapter. --- src/zawrs.adoc | 16 ---------------- 1 file changed, 16 deletions(-) diff --git a/src/zawrs.adoc b/src/zawrs.adoc index e684c8b..b47be88 100644 --- a/src/zawrs.adoc +++ b/src/zawrs.adoc @@ -1,21 +1,5 @@ == RISC-V Wait-on-Reservation-Set (Zawrs) extension -// Preamble -[WARNING] -.This document is in the link:http://riscv.org/spec-state[Ratified] -==== -No changes are allowed. Any desired or needed changes can be the subject of a -follow-on new extension. Ratified extensions are never revised -==== - -=== Copyright and license information -This specification is licensed under the Creative Commons -Attribution 4.0 International License (CC-BY 4.0). The full -license text is available at -https://creativecommons.org/licenses/by/4.0/. - -Copyright 2022 by RISC-V International. - === Contributors This RISC-V specification has been contributed to directly or indirectly by: -- cgit v1.1 From 772bfd2fabe89b3d8d21d358bbb97606e59a3bf6 Mon Sep 17 00:00:00 2001 From: wmat Date: Tue, 6 Feb 2024 16:06:47 -0500 Subject: Integration of Smstateen extension chapter. Added the applicable adoc content from the Smstateen repository to the smstateen.adoc file and included in the Priv header. Set chapter anchor, titled to be consistent with other extension chapter titles in priv and set header levels appropriately. --- src/riscv-privileged.adoc | 1 + src/smstateen.adoc | 335 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 336 insertions(+) create mode 100644 src/smstateen.adoc diff --git a/src/riscv-privileged.adoc b/src/riscv-privileged.adoc index 452f914..ec989fe 100644 --- a/src/riscv-privileged.adoc +++ b/src/riscv-privileged.adoc @@ -86,6 +86,7 @@ include::priv-intro.adoc[] include::priv-csrs.adoc[] //machine.tex include::machine.adoc[] +include::smstateen.adoc[] //rnmi.tex include::rnmi.adoc[] //supervisor.tex diff --git a/src/smstateen.adoc b/src/smstateen.adoc new file mode 100644 index 0000000..360eb31 --- /dev/null +++ b/src/smstateen.adoc @@ -0,0 +1,335 @@ +[[smstateen]] +== "Smststeen" State Enable Extension + +=== Motivation + +The implementation of optional RISC-V extensions has the potential to open +covert channels between separate user threads, or between separate guest OSes +running under a hypervisor. The problem occurs when an extension adds processor +state---usually explicit registers, but possibly other forms of state---that +the main OS or hypervisor is unaware of (and hence won't context-switch) but +that can be modified/written by one user thread or guest OS and +perceived/examined/read by another. + +For example, the proposed Advanced Interrupt Architecture (AIA) for RISC-V adds +to a hart as many as ten supervisor-level CSRs (`siselect`, `sireg`, `stopi`, +`sseteipnum`, `sclreipnum`, `sseteienum`, `sclreienum`, `sclaimei`, `sieh`, and `siph`) and +provides also the option for hardware to be backward-compatible with older, +pre-AIA software. Because an older hypervisor that is oblivious to the AIA will +not know to swap any of the AIA's new CSRs on context switches, the registers may +then be used as a covert channel between multiple guest OSes that run atop this +hypervisor. Although traditional practices might consider such a communication +channel harmless, the intense focus on security today argues that a means be +offered to plug such channels. + +The `f` registers of the RISC-V floating-point extensions and the `v` registers of +the vector extension would similarly be potential covert channels between user +threads, except for the existence of the FS and VS fields in the `sstatus` +register. Even if an OS is unaware of, say, the vector extension and its `v` +registers, access to those registers is blocked when the VS field is +initialized to zero, either at machine level or by the OS itself initializing +`sstatus`. + +Obviously, one way to prevent the use of new user-level CSRs as covert channels +would be to add to `mstatus` or `sstatus` an "XS" field for each relevant +extension, paralleling the V extension's VS field. However, this is not +considered a general solution to the problem due to the number of potential +future extensions that may add small amounts of state. Even with a 64-bit +`sstatus` (necessitating adding `sstatush` for RV32), it is not certain there are +enough remaining bits in `sstatus` to accommodate all future user-level +extensions. In any event, there is no need to strain `sstatus` (and add `sstatush`) +for this purpose. The "enable" flags that are needed to plug covert channels +are not generally expected to require swapping on context switches of user +threads, making them a less-than-compelling candidate for inclusion in `sstatus`. +Hence, a new place is proposed for them instead. + +=== Proposal + +For RV64 harts, this extension adds four new 64-bit CSRs at machine level, +listed with their CSR addresses: + +`0x30C mstateen0` (Machine State Enable 0) + +`0x30D mstateen1` + +`0x30E mstateen2` + +`0x30F mstateen3` + +If supervisor mode is implemented, another four CSRs are defined at supervisor +level: + +`0x10C sstateen0` + +`0x10D sstateen1` + +`0x10E sstateen2` + +`0x10F sstateen3` + +And if the hypervisor extension is implemented, another set of CSRs is added: + +`0x60C hstateen0` + +`0x60D hstateen1` + +`0x60E hstateen2` + +`0x60F hstateen3` + +For RV32, the registers listed above are 32-bit, and for the machine-level and +hypervisor CSRs there is a corresponding set of high-half CSRs for the upper 32 +bits of each register: + +`0x31C mstateen0h` + +`0x31D mstateen1h` + +`0x31E mstateen2h` + +`0x31F mstateen3h` + +`0x61C hstateen0h` + +`0x61D hstateen1h` + +`0x61E hstateen2h` + +`0x61F hstateen3h` + +For the supervisor-level `sstateen` registers, high-half CSRs are not added at +this time because it is expected the upper 32 bits of these registers will +always be zeros, as explained later below. + +Each bit of a `stateen` CSR controls less-privileged access to an extension's +state, for an extension that was not deemed "worthy" of a full XS field in +`sstatus` like the FS and VS fields for the F and V extensions. The number of +registers provided at each level is four because it is believed that 4 * 64 = +256 bits for machine and hypervisor levels, and 4 * 32 = 128 bits for +supervisor level, will be adequate for many years to come, perhaps for as long +as the RISC-V ISA is in use. The exact number four is an attempted compromise +between providing too few bits on the one hand and going overboard with CSRs +that will never be used on the other. A possible future doubling of the number +of `stateen` CSRs is covered later. + +The `stateen` registers at each level control access to state at all +less-privileged levels, but not at its own level. This is analogous to how the +existing `counteren` CSRs control access to performance counter registers. Just +as with the `counteren` CSRs, when a `stateen` CSR prevents access to state by +less-privileged levels, an attempt in one of those privilege modes to execute +an instruction that would read or write the protected state raises an illegal +instruction exception, or, if executing in VS or VU mode and the circumstances +for a virtual instruction exception apply, raises a virtual instruction +exception instead of an illegal instruction exception. + +When this extension is not implemented, all state added by an extension is +accessible as defined by that extension. + +When a `stateen` CSR prevents access to state for a privilege mode, attempting to +execute in that privilege mode an instruction that _implicitly_ updates the +state without reading it may or may not raise an illegal instruction or virtual +instruction exception. Such cases must be disambiguated by being explicitly +specified one way or the other. + +In some cases, the bits of the `stateen` CSRs will have a dual purpose as enables +for the ISA extensions that introduce the controlled state. + +Each bit of a supervisor-level `sstateen` CSR controls user-level access (from +U-mode or VU-mode) to an extension's state. The intention is to allocate the +bits of `sstateen` CSRs starting at the least-significant end, bit 0, through to +bit 31, and then on to the next-higher-numbered `sstateen` CSR. + +For every bit with a defined purpose in an `sstateen` CSR, the same bit is +defined in the matching `mstateen` CSR to control access below machine level to +the same state. The upper 32 bits of an `mstateen` CSR (or for RV32, the +corresponding high-half CSR) control access to state that is inherently +inaccessible to user level, so no corresponding enable bits in the +supervisor-level `sstateen` CSR are applicable. The intention is to allocate bits +for this purpose starting at the most-significant end, bit 63, through to bit +32, and then on to the next-higher `mstateen` CSR. If the rate that bits are +being allocated from the least-significant end for `sstateen` CSRs is +sufficiently low, allocation from the most-significant end of `mstateen` CSRs may +be allowed to encroach on the lower 32 bits before jumping to the next-higher +`mstateen` CSR. In that case, the bit positions of "encroaching" bits will remain +forever read-only zeros in the matching `sstateen` CSRs. + +With the hypervisor extension, the `hstateen` CSRs have identical encodings to +the `mstateen` CSRs, except controlling accesses for a virtual machine (from VS +and VU modes). + +Each standard-defined bit of a `stateen` CSR is WARL and may be read-only zero or +one, subject to the following conditions. + +Bits in any `stateen` CSR that are defined to control state that a hart doesn't +implement are read-only zeros for that hart. Likewise, all reserved bits not +yet given a defined meaning are also read-only zeros. For every bit in an +`mstateen` CSR that is zero (whether read-only zero or set to zero), the same bit +appears as read-only zero in the matching `hstateen` and `sstateen` CSRs. For every +bit in an `hstateen` CSR that is zero (whether read-only zero or set to zero), +the same bit appears as read-only zero in `sstateen` when accessed in VS-mode. + +A bit in a supervisor-level `sstateen` CSR cannot be read-only one unless the +same bit is read-only one in the matching `mstateen` CSR and, if it exists, in +the matching `hstateen` CSR. A bit in an `hstateen` CSR cannot be read-only one +unless the same bit is read-only one in the matching `mstateen` CSR. + +On reset, all writable `mstateen` bits are initialized by the hardware to zeros. +If machine-level software changes these values, it is responsible for +initializing the corresponding writable bits of the `hstateen` and `sstateen` CSRs +to zeros too. Software at each privilege level should set its respective +`stateen` CSRs to indicate the state it is prepared to allow less-privileged +software to access. For OSes and hypervisors, this usually means the state that +the OS or hypervisor is prepared to swap on a context switch, or to manage in +some other way. + +For each `mstateen` CSR, bit 63 is defined to control access to the +matching `sstateen` and `hstateen` CSRs. +That is, bit 63 of `mstateen0` controls access to `sstateen0` and `hstateen0`; +bit 63 of `mstateen1` controls access to `sstateen1` and `hstateen1`; etc. +Likewise, bit 63 of each `hstateen` correspondingly controls access to +the matching `sstateen` CSR. +A hypervisor may need this control over +accesses to the `sstateen` CSRs if it ever must emulate for a virtual machine an +extension that is supposed to be affected by a bit in an `sstateen` CSR. (Even if +such emulation is uncommon, it should not be excluded.) Machine-level software +needs identical control to be able to emulate the hypervisor extension. (That +is, machine level needs control over accesses to the supervisor-level `sstateen` +CSRs in order to emulate the `hstateen` CSRs, which have such control.) + +Bit 63 of each `mstateen` CSR may be read-only zero only if the hypervisor +extension is not implemented and the matching supervisor-level `sstateen` CSR is +all read-only zeros. In that case, machine-level software should emulate +attempts to access the affected `sstateen` CSR from S-mode, ignoring writes and +returning zero for reads. Bit 63 of each `hstateen` CSR is always writable (not +read-only). + +Initially, the following bits are defined in `mstateen0`, +`hstateen0`, and `sstateen0`: + +bit 0 - Custom state + +bit 1 - `fcsr` for Zfinx and related extensions (Zdinx, etc.) + +Bit 0 controls access to any and all custom state. + +(Bit 0 of these registers is not custom state itself; it is a standard field of +a standard CSR, either `mstateen0`, `hstateen0`, or `sstateen0`. The requirements +that non-standard extensions must meet to be _conforming_ are not relaxed due +solely to changes in the value of this bit. In particular, if software sets +this bit but does not execute any custom instructions or access any custom +state, the software must continue to execute as specified by all relevant +RISC-V standards, or the hardware is not standard-conforming.) + +Bit 1 applies only for the case when floating-point instructions operate on `x` +registers instead of `f` registers. Whenever `misa`.F = 1, bit 1 of `mstateen0` is +read-only zero (and hence read-only zero in `hstateen0` and `sstateen0` too). For +convenience, when the `stateen` CSRs are implemented and `misa`.F = 0, then if bit +1 of a controlling `stateen0` CSR is zero, _all_ floating-point instructions +cause an illegal instruction trap (or virtual instruction trap, if relevant), +as though they all access `fcsr`, regardless of whether they really do. + +In addition to the bits listed above for user-accessible state, the following +are also defined initially for `mstateen0`: + +bit 57 - `hcontext`, `scontext` + +bits 60:58 - Reserved for the RISC-V Advanced Interrupt Architecture + +bit 61 - Reserved for possible `henvcfg2`/`henvcfg2h`, `senvcfg2` + +bit 62 - `henvcfg`/`henvcfgh`, `senvcfg` + +bit 63 - `hstateen0`/`hstateen0h`, `sstateen0` + +The bits defined initially for `hstateen0` are the same as those for `mstateen0` +except applying only to state that is accessible in VS-mode: + +bit 57 - `scontext` + +bits 60:58 - Reserved for the RISC-V Advanced Interrupt Architecture + +bit 61 - Reserved for a possible `senvcfg2` + +bit 62 - `senvcfg` + +bit 63 - `sstateen0` + +(Setting `hstateen0` bit 58 to zero prevents a virtual machine from accessing the +hart's IMSIC the same as setting `hstatus`.VGEIN = 0.) + +=== Usage + +After the writable bits of the machine-level `mstateen` CSRs are initialized to +zeros on reset, machine-level software can set bits in these registers to +enable less-privileged access to the controlled state. This may be either +because machine-level software knows how to swap the state or, more likely, +because machine-level software isn't swapping supervisor-level environments. +(Recall that the main reason the `mstateen` CSRs must exist is so machine level +can emulate the hypervisor extension. When machine level isn't emulating the +hypervisor extension, it is likely there will be no need to keep any +implemented `mstateen` bits zero.) + +If machine level sets any writable `mstateen` bits to nonzero, it must initialize +the matching `hstateen` CSRs, if they exist, by writing zeros to them. And if any +`mstateen` bits that are set to one have matching bits in the `sstateen` CSRs, +machine-level software must also initialize those `sstateen` CSRs by writing +zeros to them. Ordinarily, machine-level software will want to set bit 63 of +all `mstateen` CSRs, necessitating that it write zero to all `hstateen` CSRs. + +Software should ensure that all writable bits of `sstateen` CSRs are initialized +to zeros when an OS at supervisor level is first entered. The OS can then set +bits in these registers to enable user-level access to the controlled state, +presumably because it knows how to context-swap the state. + +For the `sstateen` CSRs whose access by a guest OS is permitted by bit 63 of the +corresponding `hstateen` CSRs, a hypervisor must include the `sstateen` CSRs in the +context it swaps for a guest OS. When it starts a new guest OS, it must ensure +the writable bits of those `sstateen` CSRs are initialized to zeros, and it must +emulate accesses to any other `sstateen` CSRs. + +If software at any privilege level does not support multiple contexts for +less-privilege levels, then it may choose to maximize less-privileged access to +all state by writing a value of all ones to the `stateen` CSRs at its level (the +`mstateen` CSRs for machine level, the `sstateen` CSRs for an OS, and the `hstateen` +CSRs for a hypervisor), without knowing all the state to which it is granting +access. This is justified because there is no risk of a covert channel between +execution contexts at the less-privileged level when only one context exists +at that level. This situation is expected to be common for machine level, and +it might also arise, for example, for a type-1 hypervisor that hosts only a +single guest virtual machine. + +=== Possible expansion + +If a need is anticipated, the set of `stateen` CSRs could in the future be +doubled by adding these: + +`0x38C mstateen4` `0x39C mstateen4h` + +`0x38D mstateen5` `0x39D mstateen5h` + +`0x38E mstateen6` `0x39E mstateen6h` + +`0x38F mstateen7` `0x39F mstateen7h` + +`0x18C sstateen4` + +`0x18D sstateen5` + +`0x18E sstateen6` + +`0x18F sstateen7` + +`0x68C hstateen4` `0x69C hstateen4h` + +`0x68D hstateen5` `0x69D hstateen5h` + +`0x68E hstateen6` `0x69E hstateen6h` + +`0x68F hstateen7` `0x69F hstateen7h` + +These additional CSRs are not a definite part of the original proposal because +it is unclear whether they will ever be needed, and it is believed the rate of +consumption of bits in the first group, registers numbered 0-3, will be slow +enough that any looming shortage will be perceptible many years in advance. At +the moment, it is not known even how many years it may take to exhaust just +`mstateen0`, `sstateen0`, and `hstateen0`. \ No newline at end of file -- cgit v1.1 From b647ef585234b896aa75ff6c8d4147f295e7c44e Mon Sep 17 00:00:00 2001 From: wmat Date: Tue, 6 Feb 2024 18:13:48 -0500 Subject: Adding smepmp chapter. Adding smepmp chapter to the Privileged Specification. --- src/images/smepmp-visual-representation.png | Bin 0 -> 89113 bytes src/riscv-privileged.adoc | 1 + src/smepmp.adoc | 170 ++++++++++++++++++++++++++++ 3 files changed, 171 insertions(+) create mode 100644 src/images/smepmp-visual-representation.png create mode 100644 src/smepmp.adoc diff --git a/src/images/smepmp-visual-representation.png b/src/images/smepmp-visual-representation.png new file mode 100644 index 0000000..9502271 Binary files /dev/null and b/src/images/smepmp-visual-representation.png differ diff --git a/src/riscv-privileged.adoc b/src/riscv-privileged.adoc index 452f914..c868824 100644 --- a/src/riscv-privileged.adoc +++ b/src/riscv-privileged.adoc @@ -86,6 +86,7 @@ include::priv-intro.adoc[] include::priv-csrs.adoc[] //machine.tex include::machine.adoc[] +include::smepmp.adoc[] //rnmi.tex include::rnmi.adoc[] //supervisor.tex diff --git a/src/smepmp.adoc b/src/smepmp.adoc new file mode 100644 index 0000000..c52d77c --- /dev/null +++ b/src/smepmp.adoc @@ -0,0 +1,170 @@ +[[smepmp]] +== PMP Enhancements for memory access and execution prevention on Machine mode (Smepmp) +=== Introduction + +Being able to access the memory of a process running at a high privileged execution mode, such as the Supervisor or Machine mode, from a lower privileged mode such as the User mode, introduces an obvious attack vector since it allows for an attacker to perform privilege escalation, and tamper with the code and/or data of that process. A less obvious attack vector exists when the reverse happens, in which case an attacker instead of tampering with code and/or data that belong to a high-privileged process, can tamper with the memory of an unprivileged / less-privileged process and trick the high-privileged process to use or execute it. + +To prevent this attack vector, two mechanisms known as Supervisor Memory Access Prevention (SMAP) and Supervisor Memory Execution Prevention (SMEP) were introduced in recent systems. The first one prevents the OS from accessing the memory of an unprivileged process unless a specific code path is followed, and the second one prevents the OS from executing the memory of an unprivileged process at all times. RISC-V already includes support for SMAP, through the ``sstatus.SUM`` bit, and for SMEP by always denying execution of virtual memory pages marked with the U bit, with Supervisor mode (OS) privileges, as mandated on the Privilege Spec. + + +[NOTE] +==== +Terms: + +* *PMP Entry*: A pair of ``pmpcfg[i]`` / ``pmpaddr[i]`` registers. +* *PMP Rule*: The contents of a pmpcfg register and its associated pmpaddr register(s), that encode a valid protected physical memory region, where ``pmpcfg[i].A != OFF``, and if ``pmpcfg[i].A == TOR``, ``pmpaddr[i-1] < pmpaddr[i]``. +* *Ignored*: Any permissions set by a matching PMP rule are ignored, and _all_ accesses to the requested address range are allowed. +* *Enforced*: Only access types configured in the PMP rule matching the requested address range are allowed; failures will cause an access-fault exception. +* *Denied*: Any permissions set by a matching PMP rule are ignored, and _no_ accesses to the requested address range are allowed.; failures will cause an access-fault exception. +* *Locked*: A PMP rule/entry where the ``pmpcfg.L`` bit is set. +* *PMP reset*: A reset process where all PMP settings of the hart, including locked rules/settings, are re-initialized to a set of safe defaults, before releasing the hart (back) to the firmware / OS / application. +==== + +==== Threat model + +However, there are no such mechanisms available on Machine mode in the current (v1.11) Privileged Spec. It is not possible for a PMP rule to be *enforced* only on non-Machine modes and *denied* on Machine mode, to only allow access to a memory region by less-privileged modes. it is only possible to have a *locked* rule that will be *enforced* on all modes, or a rule that will be *enforced* on non-Machine modes and be *ignored* by Machine mode. So for any physical memory region which is not protected with a Locked rule, Machine mode has unlimited access, including the ability to execute it. + +Without being able to protect less-privileged modes from Machine mode, it is not possible to prevent the mentioned attack vector. This becomes even more important for RISC-V than on other architectures, since implementations are allowed where a hart only has Machine and User modes available, so the whole OS will run on Machine mode instead of the non-existent Supervisor mode. In such implementations the attack surface is greatly increased, and the same kind of attacks performed on Supervisor mode and mitigated through SMAP/SMEP, can be performed on Machine mode without any available mitigations. Even on implementations with Supervisor mode present attacks are still possible against the Firmware and/or the Secure Monitor running on Machine mode. + +[[proposal]] +=== Proposal + +. *Machine Security Configuration (mseccfg)* is a new RW Machine mode CSR, used for configuring various security mechanisms present on the hart, and only accessible to Machine mode. It is 64 bits wide, and is at address *0x747 on RV64* and *0x747 (low 32bits), 0x757 (high 32bits) on RV32*. All mseccfg fields defined on this proposal are WARL, and the remaining bits are reserved for future standard use and should always read zero. The reset value of mseccfg is implementation-specific, otherwise if backwards compatibility is a requirement it should reset to zero on hard reset. + +. On ``mseccfg`` we introduce a field on bit 2 called *Rule Locking Bypass (mseccfg.RLB)* with the following functionality: ++ +.. When ``mseccfg.RLB`` is 1 *locked* PMP rules may be removed/modified and *locked* PMP entries may be edited. + +.. When ``mseccfg.RLB`` is 0 and ``pmpcfg.L`` is 1 in any rule or entry (including disabled entries), then ``mseccfg.RLB`` remains 0 and any further modifications to ``mseccfg.RLB`` are ignored until a *PMP reset*. ++ +[CAUTION] +==== +Note that this feature is intended to be used as a debug mechanism, or as a temporary workaround during the boot process for simplifying software, and optimizing the allocation of memory and PMP rules. Using this functionality under normal operation, after the boot process is completed, should be avoided since it weakens the protection of _M-mode-only_ rules. Vendors who don’t need this functionality may hardwire this field to 0. +==== + +. On ``mseccfg`` we introduce a field in bit 1 called *Machine Mode Whitelist Policy (mseccfg.MMWP)*. This is a sticky bit, meaning that once set it cannot be unset until a *PMP reset*. When set it changes the default PMP policy for M-mode when accessing memory regions that don’t have a matching PMP rule, to *denied* instead of *ignored*. + +. On ``mseccfg`` we introduce a field in bit 0 called *Machine Mode Lockdown (mseccfg.MML)*. This is a sticky bit, meaning that once set it cannot be unset until a *PMP reset*. When ``mseccfg.MML`` is set the system's behavior changes in the following way: + +.. The meaning of ``pmpcfg.L`` changes: Instead of marking a rule as *locked* and *enforced* in all modes, it now marks a rule as *M-mode-only* when set and *S/U-mode-only* when unset. The formerly reserved encoding of ``pmpcfg.RW=01``, and the encoding ``pmpcfg.LRWX=1111``, now encode a *Shared-Region*. ++ +An _M-mode-only_ rule is *enforced* on Machine mode and *denied* in Supervisor or User mode. It also remains *locked* so that any further modifications to its associated configuration or address registers are ignored until a *PMP reset*, unless ``mseccfg.RLB`` is set. ++ +An _S/U-mode-only_ rule is *enforced* on Supervisor and User modes and *denied* on Machine mode. ++ +A _Shared-Region_ rule is *enforced* on all modes, with restrictions depending on the ``pmpcfg.L`` and ``pmpcfg.X`` bits: ++ +* A _Shared-Region_ rule where ``pmpcfg.L`` is not set can be used for sharing data between M-mode and S/U-mode, so is not executable. M-mode has read/write access to that region, and S/U-mode has read access if ``pmpcfg.X`` is not set, or read/write access if ``pmpcfg.X`` is set. ++ +* A _Shared-Region_ rule where ``pmpcfg.L`` is set can be used for sharing code between M-mode and S/U-mode, so is not writeable. Both M-mode and S/U-mode have execute access on the region, and M-mode also has read access if ``pmpcfg.X`` is set. The rule remains *locked* so that any further modifications to its associated configuration or address registers are ignored until a *PMP reset*, unless ``mseccfg.RLB`` is set. ++ +* The encoding ``pmpcfg.LRWX=1111`` can be used for sharing data between M-mode and S/U mode, where both modes only have read-only access to the region. The rule remains *locked* so that any further modifications to its associated configuration or address registers are ignored until a *PMP reset*, unless ``mseccfg.RLB`` is set. + + +.. Adding a rule with executable privileges that either is *M-mode-only* or a *locked* *Shared-Region* is not possible and such ``pmpcfg`` writes are ignored, leaving ``pmpcfg`` unchanged. This restriction can be temporarily lifted by setting ``mseccfg.RLB`` e.g. during the boot process. + +.. Executing code with Machine mode privileges is only possible from memory regions with a matching *M-mode-only* rule or a *locked* *Shared-Region* rule with executable privileges. Executing code from a region without a matching rule or with a matching _S/U-mode-only_ rule is *denied*. + +.. If ``mseccfg.MML`` is not set, the combination of ``pmpcfg.RW=01`` remains reserved for future standard use. + + +==== Truth table when mseccfg.MML is set + +[cols="^1,^1,^1,^1,^3,^3",stripes=even,options="header"] +|=== +4+|Bits on _pmpcfg_ register {set:cellbgcolor:green} 2+|Result +|L|R|W|X|M Mode|S/U Mode +|{set:cellbgcolor:!} 0|0|0|0 2+|Inaccessible region (Access Exception) +|0|0|0|1|Access Exception|Execute-only region +|0|0|1|0 2+|Shared data region: Read/write on M mode, read-only on S/U mode +|0|0|1|1 2+|Shared data region: Read/write for both M and S/U mode +|0|1|0|0|Access Exception|Read-only region +|0|1|0|1|Access Exception|Read/Execute region +|0|1|1|0|Access Exception|Read/Write region +|0|1|1|1|Access Exception|Read/Write/Execute region +|1|0|0|0 2+|Locked inaccessible region* (Access Exception) +|1|0|0|1|Locked Execute-only region*|Access Exception +|1|0|1|0 2+|Locked Shared code region: Execute only on both M and S/U mode.* +|1|0|1|1 2+|Locked Shared code region: Execute only on S/U mode, read/execute on M mode.* +|1|1|0|0|Locked Read-only region*|Access Exception +|1|1|0|1|Locked Read/Execute region*|Access Exception +|1|1|1|0|Locked Read/Write region*|Access Exception +|1|1|1|1 2+|Locked Shared data region: Read only on both M and S/U mode.* +|=== + +*: *Locked* rules cannot be removed or modified until a *PMP reset*, unless ``mseccfg.RLB`` is set. + +==== Visual representation of the proposal + +image::smepmp-visual-representation.png[] + +=== Smepmp software discovery + +Since all fields defined on ``mseccfg`` as part of this proposal are locked when set (``MMWP``/``MML``) or locked when cleared (``RLB``), software can't poll them for determining the presence of Smepmp. It is expected that BootROM will set ``mseccfg.MMWP`` and/or ``mseccfg.MML`` during early boot, before jumping to the firmware, so that the firmware will be able to determine the presence of Smepmp by reading ``mseccfg`` and checking the state of ``mseccfg.MMWP`` and ``mseccfg.MML``. + +[[rationale]] +=== Rationale + +. Since a CSR for security and / or global PMP behavior settings is not available with the current spec, we needed to define a new one. This new CSR will allow us to add further security configuration options in the future and also allow developers to verify the existence of the new mechanisms defined on this proposal. + +. There are use cases where developers want to enforce PMP rules in M-mode during the boot process, that are also able to modify, merge, and / or remove later on. Since a rule that is enforced in M-mode also needs to be locked (or else badly written or malicious M-mode software can remove it at any time), the only way for developers to approach this is to keep adding PMP rules to the chain and rely on rule priority. This is a waste of PMP rules and since it’s only needed during boot, ``mseccfg.RLB`` is a simple workaround that can be used temporarily and then disabled and locked down. ++ +Also when ``mseccfg.MML`` is set, according to 4b it’s not possible to add a _Shared-Region_ rule with executable privileges. So RLB can be set temporarily during the boot process to register such regions. Note that it’s still possible to register executable _Shared-Region_ rules using initial register settings (that may include ``mseccfg.MML`` being set and the rule being set on PMP registers) on *PMP reset*, without using RLB. ++ +[WARNING] +==== +*Be aware that RLB introduces a security vulnerability if left set after the boot process is over and in general it should be used with caution, even when used temporarily.* Having editable PMP rules in M-mode gives a false sense of security since it only takes a few malicious instructions to lift any PMP restrictions this way. It doesn’t make sense to have a security control in place and leave it unprotected. Rule Locking Bypass is only meant as a way to optimize the allocation of PMP rules, catch errors durring debugging, and allow the bootrom/firmware to register executable _Shared-Region_ rules. If developers / vendors have no use for such functionality, they should never set ``mseccfg.RLB`` and if possible hard-wire it to 0. In any case *RLB should be disabled and locked as soon as possible*. +==== ++ +[NOTE] +==== +If ``mseccfg.RLB`` is not used and left unset, it wil be locked as soon as a PMP rule/entry with the ``pmpcfg.L`` bit set is configured. +==== ++ +[IMPORTANT] +==== +Since PMP rules with a higher priority override rules with a lower priority, locked rules must precede non-locked rules. +==== + +. With the current spec M-mode can access any memory region unless restricted by a PMP rule with the ``pmpcfg.L`` bit set. There are cases where this approach is overly permissive, and although it’s possible to restrict M-mode by adding PMP rules during the boot process, this can also be seen as a waste of PMP rules. Having the option to block anything by default, and use PMP as a whitelist for M-mode is considered a safer approach. This functionality may be used during the boot process or upon *PMP reset*, using initial register settings. + + +. The current dual meaning of the ``pmpcfg.L`` bit that marks a rule as Locked and *enforced* on all modes is neither flexible nor clean. With the introduction of _Machine Mode Lock-down_ the ``pmpcfg.L`` bit distinguishes between rules that are *enforced* *only* in M-mode (_M-mode-only_) or *only* in S/U-modes (_S/U-mode-only_). The rule locking becomes part of the definition of an _M-mode-only_ rule, since when a rule is added in M mode, if not locked, can be modified or removed in a few instructions. On the other hand, S/U modes can’t modify PMP rules anyway so locking them doesn’t make sense. + +.. This separation between _M-mode-only_ and _S/U-mode-only_ rules also allows us to distinguish which regions are to be used by processes in Machine mode (``pmpcfg.L == 1``) and which by Supervisor or User mode processes (``pmpcfg.L == 0``), in the same way the U bit on the Virtual Memory’s PTEs marks which Virtual Memory pages are to be used by User mode applications (U=1) and which by the Supervisor / OS (U=0). With this distinction in place we are able to implement memory access and execution prevention in M-mode for any physical memory region that is not _M-mode-only_. ++ +An attacker that manages to tamper with a memory region used by S/U mode, even after successfully tricking a process running in M-mode to use or execute that region, will fail to perform a successful attack since that region will be _S/U-mode-only_ hence any access when in M-mode will trigger an access exception. ++ +[INFO] +==== +In order to support zero-copy transfers between M-mode and S/U-mode we need to either allow shared memory regions, or introduce a mechanism similar to the ``sstatus.SUM`` bit to temporary allow the high-privileged mode (in this case M-mode) to be able to perform loads and stores on the region of a less-privileged process (in this case S/U-mode). In our case after discussion within the group it seemed a better idea to follow the first approach and have this functionality encoded on a per-rule basis to avoid the risk of leaving a temporary, global bypass active when exiting M-mode, hence rendering memory access prevention useless. +==== ++ +[INFO] +==== +Although it’s possible to use ``mstatus.MPRV`` in M-mode to read/write data on an _S/U-mode-only_ region using general purpose registers for copying, this will happen with S/U-mode permissions, honoring any MMU restrictions put in place by S-mode. Of course it’s still possible for M-mode to tamper with the page tables and / or add _S/U-mode-only_ rules and bypass the protections put in place by S-mode but if an attacker has managed to compromise M-mode to such extent, no security guarantees are possible in any way. *Also note that the threat model we present here assumes buggy software in M-mode, not compromised software*. We considered disabling ``mstatus.MPRV`` but it seemed too much and out of scope. +==== ++ +_Shared-region_ rules can be used both for zero-copy data transfers and for sharing code segments. The latter may be used for example to allow S/U-mode to execute code by the vendor, that makes use of some vendor-specific ISA extension, without having to go through the firmware with an ecall. This is similar to the vDSO approach followed on Linux, that allows userspace code to execute kernel code without having to perform a system call. ++ +To make sure that shared data regions can’t be executed and shared code regions can’t be modified, the encoding changes the meaning of the ``pmpcfg.X bit``. In case of shared data regions, with the exception of the ``pmpcfg.LRWX=1111`` encoding, the ``pmpcfg.X`` bit marks the capability of S/U-mode to write to that region, so it’s not possible to encode an executable shared data region. In case of shared code regions, the ``pmpcfg.X`` bit marks the capability of M-mode to read from that region, and since ``pmpcfg.RW=01`` is used for encoding the shared region, it’s not possible to encode a shared writable code region. ++ +[NOTE] +==== +For adding _Shared-region_ rules with executable privileges to share code segments between M-mode and S/U-mode, ``mseccfg.RLB`` needs to be implemented, or else such rules can only be added together with ``mseccfg.MML`` being set on *PMP Reset*. That's because the reserved encoding ``pmpcfg.RW=01`` being used for _Shared-region_ rules is only defined when ``mseccfg.MML`` is set, and 4b prevents the adition of rules with executable privileges on M-mode after ``mseccfg.MML`` is set unless ``mseccfg.RLB`` is also set. +==== ++ +[INFO] +==== +Using the ``pmpcfg.LRWX=1111`` encoding for a locked shared read-only data region was decided later on, its initial meaning was an M-mode-only read/write/execute region. The reason for that change was that the already defined shared data regions were not locked, so r/w access to M-mode couldn’t be restricted. In the same way we have execute-only shared code regions for both modes, it was decided to also be able to allow a least-privileged shared data region for both modes. This approach allows for example to share the .text section of an ELF with a shared code region and the .rodata section with a locked shared data region, without allowing M-mode to modify .rodata. We also decided that having a locked read/write/execute region in M-mode doesn’t make much sense and could be dangerous, since M-mode won’t be able to add further restrictions there (as in the case of S/U-mode where S-mode can further limit access to an ``pmpcfg.LWRX=0111`` region through the MMU), leaving the possibility of modifying an executable region in M-mode open. +==== ++ +[INFO] +==== +For encoding Shared-region rules initially we used one of the two reserved bits on pmpcfg (bit 5) but in order to avoid allocating an extra bit, since those bits are a very limited resource, it was decided to use the reserved R=0,W=1 combination. +==== +.. The idea with this restriction is that after the Firmware or the OS running in M-mode is initialized and ``mseccfg.MML`` is set, no new code regions are expected to be added since nothing else is expected to run in M-mode (everything else will run in S/U mode). Since we want to limit the attack surface of the system as much as possible, it makes sense to disallow any new code regions which may include malicious code, to be added/executed in M-mode. + +.. In case ``mseccfg.MMWP`` is not set, M-mode can still access and execute any region not covered by a PMP rule. Since we try to prevent M-mode from executing malicious code and since an attacker may manage to place code on some region not covered by PMP (e.g. a directly-addressable flash memory), we need to ensure that M-mode can only execute the code segments initialized during firmware / OS initialization. + +.. We are only using the encoding ``pmpcfg.RW=01`` together with ``mseccfg.MML``, if ``mseccfg.MML`` is not set the encoding remains usable for future use. + -- cgit v1.1 From 259f923b727bd2ce0c401783644ae017662ce37f Mon Sep 17 00:00:00 2001 From: wmat Date: Tue, 13 Feb 2024 11:40:13 -0500 Subject: Delete old vector repo pointer. As suggested by Victor, this removes the link to the old Vector spec GitHub repository. --- src/v-st-ext.adoc | 3 --- 1 file changed, 3 deletions(-) diff --git a/src/v-st-ext.adoc b/src/v-st-ext.adoc index c9a1b66..21972a3 100644 --- a/src/v-st-ext.adoc +++ b/src/v-st-ext.adoc @@ -1,9 +1,6 @@ [[vector]] == "V" Standard Extension for Vector Operations, Version 1.0 -The specification is currently hosted at -https://github.com/riscv/riscv-v-spec. - [NOTE] ==== _The base vector extension is intended to provide general support for -- cgit v1.1 From 77ad5bdc978681f42f860b3c79bf79726965af50 Mon Sep 17 00:00:00 2001 From: wmat Date: Tue, 13 Feb 2024 11:51:32 -0500 Subject: Renaming Vector Calling Convention appendix. As suggested Vector Colling Convention appendix is being renamed to better indicate that it applies to Vector only. --- src/calling-convention.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/calling-convention.adoc b/src/calling-convention.adoc index 9ea5505..f5cb079 100644 --- a/src/calling-convention.adoc +++ b/src/calling-convention.adoc @@ -1,5 +1,5 @@ [appendix] -== Calling Convention (Not authoritative - Placeholder Only) +== Calling Convention for Vector State (Not authoritative - Placeholder Only) NOTE: This Appendix is only a placeholder to help explain the conventions used in the code examples, and is not considered frozen or -- cgit v1.1 From aa87978b683a79bd432be6f50b578b5fe5ad91e2 Mon Sep 17 00:00:00 2001 From: wmat Date: Tue, 13 Feb 2024 11:57:44 -0500 Subject: Move Fractional Lmul Example into Vector Assembly Code Examples appendix. This moves the Fractional Lmul Example into the Vector Assembly Code Examples appendix instead of a standalone appendix. --- src/fraclmul.adoc | 3 +-- src/riscv-unprivileged.adoc | 2 +- src/vector-examples.adoc | 2 ++ 3 files changed, 4 insertions(+), 3 deletions(-) diff --git a/src/fraclmul.adoc b/src/fraclmul.adoc index b872471..6f12f58 100644 --- a/src/fraclmul.adoc +++ b/src/fraclmul.adoc @@ -1,5 +1,4 @@ -[appendix] -== Fractional Lmul example +=== Fractional Lmul example This appendix presents a non-normative example to help explain where compilers can make good use of the fractional LMUL feature. diff --git a/src/riscv-unprivileged.adoc b/src/riscv-unprivileged.adoc index aea1cc1..05e8378 100644 --- a/src/riscv-unprivileged.adoc +++ b/src/riscv-unprivileged.adoc @@ -151,7 +151,7 @@ include::mm-formal.adoc[] //Appendices for Vector include::vector-examples.adoc[] include::calling-convention.adoc[] -include::fraclmul.adoc[] +//include::fraclmul.adoc[] //End of Vector appendices include::index.adoc[] // this is generated generated from index markers. diff --git a/src/vector-examples.adoc b/src/vector-examples.adoc index dade5a4..9e54acd 100644 --- a/src/vector-examples.adoc +++ b/src/vector-examples.adoc @@ -121,3 +121,5 @@ vfmul.vv v1, v2, v1, v0.t # x * 1/sqrt(x) ---- include::example/strcmp.s[lines=4..-1] ---- + +include::fraclmul.adoc[] -- cgit v1.1 From f6b2cdfbccb7a8a64e48d32141e5d01439d5fffe Mon Sep 17 00:00:00 2001 From: wmat Date: Tue, 13 Feb 2024 13:05:05 -0500 Subject: Trying to fix table rendering in PDF via table font-size and column widths. Trying to fix table rendering in PDF via table font-size and column widths. A side effect may be table breakage elsewhere in the spec. --- src/images/wavedrom/v-inst-table.adoc | 7 ++++--- src/resources/themes/riscv-spec.yml | 1 + 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/src/images/wavedrom/v-inst-table.adoc b/src/images/wavedrom/v-inst-table.adoc index 1c3511b..0c02220 100644 --- a/src/images/wavedrom/v-inst-table.adoc +++ b/src/images/wavedrom/v-inst-table.adoc @@ -1,15 +1,16 @@ // [cols="4,1,1,1,8,4,1,1,8,4,1,1,8"] +[cols="<,<,<,<,<,<,<,<,<,<,<,<,<",options="headers"] |=== 5+| Integer 4+| Integer 4+| FP | funct3 | | | | | funct3 | | | | funct3 | | | -| OPIVV |V| | | | OPMVV |V| | | OPFVV |V| | -| OPIVX | |X| | | OPMVX | |X| | OPFVF | |F| +| OPIVV |V| | | | OPMVV{nbsp} |V| | | OPFVV |V| | +| OPIVX | |X| | | OPMVX{nbsp} | |X| | OPFVF | |F| | OPIVI | | |I| | | | | | | | | |=== -// [cols="4,1,1,1,8,4,1,1,8,4,1,1,8"] +[cols="<,<,<,<,<,<,<,<,<,<,<,<,<",options="headers"] |=== 5+| funct6 4+| funct6 4+| funct6 diff --git a/src/resources/themes/riscv-spec.yml b/src/resources/themes/riscv-spec.yml index d514426..7da74cb 100644 --- a/src/resources/themes/riscv-spec.yml +++ b/src/resources/themes/riscv-spec.yml @@ -238,6 +238,7 @@ figure: align: center table: background_color: $page_background_color + font-size: 9 #head_background_color: #2596be #head_font_color: $base_font_color head_font_style: bold -- cgit v1.1 From 9dc9b5fa1ddf9fbf1d629fea4ec39a039db41353 Mon Sep 17 00:00:00 2001 From: wmat Date: Tue, 13 Feb 2024 13:10:42 -0500 Subject: Fixed Vector Fixed-Point Rounding Mode table formatting. I was able to fix the formatting for this table. --- src/v-st-ext.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/v-st-ext.adoc b/src/v-st-ext.adoc index 21972a3..0d569ec 100644 --- a/src/v-st-ext.adoc +++ b/src/v-st-ext.adoc @@ -590,8 +590,8 @@ Then the rounded result is `(v >> d) + r`, where `r` depends on the rounding mode as specified in the following table. .vxrm encoding -[cols="1,1,4,10,5"] -[%autowidth,float="center",align="center",options="header"] +//[cols="1,1,4,10,5"] +[%autowidth,float="center",align="center",cols="<,<,<,<,<",options="header"] |=== 2+| `vxrm[1:0]` | Abbreviation | Rounding Mode | Rounding increment, `r` -- cgit v1.1 From d28d56b5225ae44c811fa1422758e0e951edddc0 Mon Sep 17 00:00:00 2001 From: wmat Date: Tue, 13 Feb 2024 14:23:50 -0500 Subject: Adding Base Cache Management Operation ISA Extensions chapter. Added all content for cmo.adoc chapter. Adjusted level of all headings. Added chapter title. Added chapter inclusion to header. --- src/cmo.adoc | 1149 +++++++++++++++++++++++++++++++++++++++++++ src/riscv-unprivileged.adoc | 2 + 2 files changed, 1151 insertions(+) create mode 100644 src/cmo.adoc diff --git a/src/cmo.adoc b/src/cmo.adoc new file mode 100644 index 0000000..648c4ec --- /dev/null +++ b/src/cmo.adoc @@ -0,0 +1,1149 @@ +[[cmo]] +== Base Cache Management Operation ISA Extensions + +[acknowledgments] +=== Acknowledgments + +Contributors to this specification (in alphabetical order) include: + +Allen Baum, +Paul Donahue, +Greg Favor, +Andy Glew, +John Ingalls, +David Kruckemyer, +Josh Scheid, +Philipp Tomsich, +Paul Walmsley, +and +Derek Williams + +We express our gratitude to everyone that contributed to, reviewed, or improved +this specification through their comments and questions. + +=== Pseudocode for instruction semantics + +The semantics of each instruction in the <<#insns>> chapter is expressed in a +SAIL-like syntax. + +[#intro,reftext="Introduction"] +=== Introduction + +_Cache-management operation_ (or _CMO_) instructions perform operations on +copies of data in the memory hierarchy. In general, CMO instructions operate on +cached copies of data, but in some cases, a CMO instruction may operate on +memory locations directly. Furthermore, CMO instructions are grouped by +operation into the following classes: + +* A _management_ instruction manipulates cached copies of data with respect to a + set of agents that can access the data +* A _zero_ instruction zeros out a range of memory locations, potentially + allocating cached copies of data in one or more caches +* A _prefetch_ instruction indicates to hardware that data at a given memory + location may be accessed in the near future, potentially allocating cached + copies of data in one or more caches + +This document introduces a base set of CMO ISA extensions that operate +specifically on cache blocks or the memory locations corresponding to a cache +block; these are known as _cache-block operation_ (or _CBO_) instructions. Each +of the above classes of instructions represents an extension in this +specification: + +* The _Zicbom_ extension defines a set of cache-block management instructions: + `CBO.INVAL`, `CBO.CLEAN`, and `CBO.FLUSH` +* The _Zicboz_ extension defines a cache-block zero instruction: `CBO.ZERO` +* The _Zicbop_ extension defines a set of cache-block prefetch instructions: + `PREFETCH.R`, `PREFETCH.W`, and `PREFETCH.I` + +The execution behavior of the above instructions is also modified by CSR state +added by this specification. + +The remainder of this document provides general background information on CMO +instructions and describes each of the above ISA extensions. + +[NOTE] +==== +_The term CMO encompasses all operations on caches or resources related to +caches. The term CBO represents a subset of CMOs that operate only on cache +blocks. The first CMO extensions only define CBOs._ +==== + +[#background,reftext="Background"] +=== Background + +This chapter provides information common to all CMO extensions. + +[#memory-caches,reftext="Memory and Caches"] +==== Memory and Caches + +A _memory location_ is a physical resource in a system uniquely identified by a +_physical address_. An _agent_ is a logic block, such as a RISC-V hart, +accelerator, I/O device, etc., that can access a given memory location. + +[NOTE] +==== +_A given agent may not be able to access all memory locations in a system, and +two different agents may or may not be able to access the same set of memory +locations._ +==== + +A _load operation_ (or _store operation_) is performed by an agent to consume +(or modify) the data at a given memory location. Load and store operations are +performed as a result of explicit memory accesses to that memory location. +Additionally, a _read transfer_ from memory fetches the data at the memory +location, while a _write transfer_ to memory updates the data at the memory +location. + +A _cache_ is a structure that buffers copies of data to reduce average memory +latency. Any number of caches may be interspersed between an agent and a memory +location, and load and store operations from an agent may be satisfied by a +cache instead of the memory location. + +[NOTE] +==== +_Load and store operations are decoupled from read and write transfers by +caches. For example, a load operation may be satisfied by a cache without +performing a read transfer from memory, or a store operation may be satisfied by +a cache that first performs a read transfer from memory._ +==== + +Caches organize copies of data into _cache blocks_, each of which represents a +contiguous, naturally aligned power-of-two (or _NAPOT_) range of memory +locations. A cache block is identified by a physical address corresponding to +the underlying memory locations. The capacity and organization of a cache and +the size of a cache block are both _implementation-specific_, and the execution +environment provides software a means to discover information about the caches +and cache blocks in a system. In the initial set of CMO extensions, the size of +a cache block shall be uniform throughout the system. + +[NOTE] +==== +_In future CMO extensions, the requirement for a uniform cache block size may be +relaxed._ +==== + +Implementation techniques such as speculative execution or hardware prefetching +may cause a given cache to allocate or deallocate a copy of a cache block at any +time, provided the corresponding physical addresses are accessible according to +the supported access type PMA and are cacheable according to the cacheability +PMA. Allocating a copy of a cache block results in a read transfer from another +cache or from memory, while deallocating a copy of a cache block may result in a +write transfer to another cache or to memory depending on whether the data in +the copy were modified by a store operation. Additional details are discussed in +<<#coherent-agents-caches>>. + +==== Cache-Block Operations + +A CBO instruction causes one or more operations to be performed on the cache +blocks identified by the instruction. In general, a CBO instruction may identify +one or more cache blocks; however, in the initial set of CMO extensions, CBO +instructions identify a single cache block only. + +A cache-block management instruction performs one of the following operations, +relative to the copy of a given cache block allocated in a given cache: + +* An _invalidate operation_ deallocates the copy of the cache block + +* A _clean operation_ performs a write transfer to another cache or to memory if + the data in the copy of the cache block have been modified by a store + operation + +* A _flush operation_ atomically performs a clean operation followed by an + invalidate operation + +Additional details, including the actual operation performed by a given +cache-block management instruction, are described in <<#Zicbom>>. + +A cache-block zero instruction performs a set of store operations that write +zeros to the set of bytes corresponding to a cache block. Unless specified +otherwise, the store operations generated by a cache-block zero instruction have +the same general properties and behaviors that other store instructions in the +architecture have. An implementation may or may not update the entire set of +bytes atomically with a single store operation. Additional details are described +in <<#Zicboz>>. + +A cache-block prefetch instruction is a HINT to the hardware that software +expects to perform a particular type of memory access in the near future. +Additional details are described in <<#Zicbop>>. + +[#coherent-agents-caches,reftext="Coherent Agents and Caches"] +=== Coherent Agents and Caches + +For a given memory location, a _set of coherent agents_ consists of the agents +for which all of the following hold: + +* Store operations from all agents in the set appear to be serialized with + respect to each other +* Store operations from all agents in the set eventually appear to all other + agents in the set +* A load operation from an agent in the set returns data from a store operation + from an agent in the set (or from the initial data in memory) + +The coherent agents within such a set shall access a given memory location with +the same physical address and the same physical memory attributes; however, if +the coherence PMA for a given agent indicates a given memory location is not +coherent, that agent shall not be a member of a set of coherent agents with any +other agent for that memory location and shall be the sole member of a set of +coherent agents consisting of itself. + +An agent who is a member of a set of coherent agents is said to be _coherent_ +with respect to the other agents in the set. On the other hand, an agent who is +_not_ a member is said to be _non-coherent_ with respect to the agents in the +set. + +Caches introduce the possibility that multiple copies of a given cache block may +be present in a system at the same time. An _implementation-specific_ mechanism +keeps these copies coherent with respect to the load and store operations from +the agents in the set of coherent agents. Additionally, if a coherent agent in +the set executes a CBO instruction that specifies the cache block, the resulting +operation shall apply to any and all of the copies in the caches that can be +accessed by the load and store operations from the coherent agents. + +[NOTE] +==== +_An operation from a CBO instruction is defined to operate only on the copies of +a cache block that are cached in the caches accessible by the explicit memory +accesses performed by the set of coherent agents. This includes copies of a +cache block in caches that are accessed only indirectly by load and store +operations, e.g. coherent instruction caches._ +==== + +The set of caches subject to the above mechanism form a _set of coherent +caches_, and each coherent cache has the following behaviors, assuming all +operations are performed by the agents in a set of coherent agents: + +* A coherent cache is permitted to allocate and deallocate copies of a cache + block and perform read and write transfers as described in <<#memory-caches>> + +* A coherent cache is permitted to perform a write transfer to memory provided + that a store operation has modified the data in the cache block since the most + recent invalidate, clean, or flush operation on the cache block + +* At least one coherent cache is responsible for performing a write transfer to + memory once a store operation has modified the data in the cache block until + the next invalidate, clean, or flush operation on the cache block, after which + no coherent cache is responsible (or permitted) to perform a write transfer to + memory until the next store operation has modified the data in the cache block + +* A coherent cache is required to perform a write transfer to memory if a store + operation has modified the data in the cache block since the most recent + invalidate, clean, or flush operation on the cache block and if the next clean + or flush operation requires a write transfer to memory + +[NOTE] +==== +_The above restrictions ensure that a "clean" copy of a cache block, fetched by +a read transfer from memory and unmodified by a store operation, cannot later +overwrite the copy of the cache block in memory updated by a write transfer to +memory from a non-coherent agent._ +==== + +A non-coherent agent may initiate a cache-block operation that operates on the +set of coherent caches accessed by a set of coherent agents. The mechanism to +perform such an operation is _implementation-specific_. + +==== Memory Ordering + +===== Preserved Program Order + +The preserved program order (abbreviated _PPO_) rules are defined by the RVWMO +memory ordering model. How the operations resulting from CMO instructions fit +into these rules is described below. + +For cache-block management instructions, the resulting invalidate, clean, and +flush operations behave as stores in the PPO rules subject to one additional +overlapping address rule. Specifically, if _a_ precedes _b_ in program order, +then _a_ will precede _b_ in the global memory order if: + +* _a_ is an invalidate, clean, or flush, _b_ is a load, and _a_ and _b_ access + overlapping memory addresses + +[NOTE] +==== +_The above rule ensures that a subsequent load in program order never appears +in the global memory order before a preceding invalidate, clean, or flush +operation to an overlapping address._ +==== + +Additionally, invalidate, clean, and flush operations are classified as W or O +(depending on the physical memory attributes for the corresponding physical +addresses) for the purposes of predecessor and successor sets in `FENCE` +instructions. These operations are _not_ ordered by other instructions that +order stores, e.g. `FENCE.I` and `SFENCE.VMA`. + +For cache-block zero instructions, the resulting store operations behave as +stores in the PPO rules and are ordered by other instructions that order stores. + +Finally, for cache-block prefetch instructions, the resulting operations are +_not_ ordered by the PPO rules nor are they ordered by any other ordering +instructions. + +===== Load Values + +An invalidate operation may change the set of values that can be returned by a +load. In particular, an additional condition is added to the Load Value Axiom: + +* If an invalidate operation _i_ precedes a load _r_ and operates on a byte _x_ + returned by _r_, and no store to _x_ appears between _i_ and _r_ in program + order or in the global memory order, then _r_ returns any of the following + values for _x_: + +. If no clean or flush operations on _x_ precede _i_ in the global memory order, + either the initial value of _x_ or the value of any store to _x_ that precedes + _i_ + +. If no store to _x_ precedes a clean or flush operation on _x_ in the global + memory order and if the clean or flush operation on _x_ precedes _i_ in the + global memory order, either the initial value of _x_ or the value of any store + to _x_ that precedes _i_ + +. If a store to _x_ precedes a clean or flush operation on _x_ in the global + memory order and if the clean or flush operation on _x_ precedes _i_ in the + global memory order, either the value of the latest store to _x_ that precedes + the latest clean or flush operation on _x_ or the value of any store to _x_ + that both precedes _i_ and succeeds the latest clean or flush operation on _x_ + that precedes _i_ + +. The value of any store to _x_ by a non-coherent agent regardless of the above + conditions + +[NOTE] +==== +_The first three bullets describe the possible load values at different points +in the global memory order relative to clean or flush operations. The final +bullet implies that the load value may be produced by a non-coherent agent at +any time._ +==== + +==== Traps + +Execution of certain CMO instructions may result in traps due to CSR state, +described in the <<#csr_state>> section, or due to the address translation and +protection mechanisms. The trapping behavior of CMO instructions is described in +the following sections. + +===== Illegal Instruction and Virtual Instruction Exceptions + +Cache-block management instructions and cache-block zero instructions may raise +illegal instruction exceptions or virtual instruction exceptions depending on +the current privilege mode and the state of the CMO control registers described +in the <<#csr_state>> section. + +Cache-block prefetch instructions raise neither illegal instruction exceptions +nor virtual instruction exceptions. + +===== Page Fault, Guest-Page Fault, and Access Fault Exceptions + +Similar to load and store instructions, CMO instructions are explicit memory +access instructions that compute an effective address. The effective address is +ultimately translated into a physical address based on the privilege mode and +the enabled translation mechanisms, and the CMO extensions impose the following +constraints on the physical addresses in a given cache block: + +* The PMP access control bits shall be the same for _all_ physical addresses in + the cache block, and if write permission is granted by the PMP access control + bits, read permission shall also be granted + +* The PMAs shall be the same for _all_ physical addresses in the cache block, + and if write permission is granted by the supported access type PMAs, read + permission shall also be granted + +If the above constraints are not met, the behavior of a CBO instruction is +UNSPECIFIED. + +[NOTE] +==== +_This specification assumes that the above constraints will typically be met for +main memory regions and may be met for certain I/O regions._ +==== + +The Zicboz extension introduces an additional supported access type PMA for +cache-block zero instructions. Main memory regions are required to support +accesses by cache-block zero instructions; however, I/O regions may specify +whether accesses by cache-block zero instructions are supported. + +A cache-block management instruction is permitted to access the specified cache +block whenever a load instruction or store instruction is permitted to access +the corresponding physical addresses. If neither a load instruction nor store +instruction is permitted to access the physical addresses, but an instruction +fetch is permitted to access the physical addresses, whether a cache-block +management instruction is permitted to access the cache block is UNSPECIFIED. If +access to the cache block is not permitted, a cache-block management instruction +raises a store page fault or store guest-page fault exception if address +translation does not permit any access or raises a store access fault exception +otherwise. During address translation, the instruction also checks the accessed +bit and may either raise an exception or set the bit as required. + +[NOTE] +==== +_The interaction between cache-block management instructions and instruction +fetches will be specified in a future extension._ + +_As implied by omission, a cache-block management instruction does not check the +dirty bit and neither raises an exception nor sets the bit._ +==== + +A cache-block zero instruction is permitted to access the specified cache block +whenever a store instruction is permitted to access the corresponding physical +addresses and when the PMAs indicate that cache-block zero instructions are a +supported access type. If access to the cache block is not permitted, a +cache-block zero instruction raises a store page fault or store guest-page fault +exception if address translation does not permit write access or raises a store +access fault exception otherwise. During address translation, the instruction +also checks the accessed and dirty bits and may either raise an exception or set +the bits as required. + +A cache-block prefetch instruction is permitted to access the specified cache +block whenever a load instruction, store instruction, or instruction fetch is +permitted to access the corresponding physical addresses. If access to the cache +block is not permitted, a cache-block prefetch instruction does not raise any +exceptions and shall not access any caches or memory. During address +translation, the instruction does _not_ check the accessed and dirty bits and +neither raises an exception nor sets the bits. + +[NOTE] +==== +_Like a load or store instruction, a CMO instruction may or may not be permitted +to access a cache block based on the states of the `MPRV`, `MPV`, and `MPP` bits +in `mstatus` and the `SUM` and `MXR` bits in `mstatus`, `sstatus`, and +`vsstatus`._ + +_This specification expects that implementations will process cache-block +management instructions like store/AMO instructions, so store/AMO exceptions are +appropriate for these instructions, regardless of the permissions required._ +==== + +===== Address Misaligned Exceptions + +CMO instructions do _not_ generate address misaligned exceptions. + +===== Breakpoint Exceptions and Debug Mode Entry + +Unless otherwise defined by the debug architecture specification, the behavior +of trigger modules with respect to CMO instructions is UNSPECIFIED. + +[NOTE] +==== +_For the Zicbom, Zicboz, and Zicbop extensions, this specification recommends +the following common trigger module behaviors:_ + +* Type 6 address match triggers, i.e. `tdata1.type=6` and `mcontrol6.select=0`, + should be supported + +* Type 2 address/data match triggers, i.e. `tdata1.type=2`, should be + unsupported + +* The size of a memory access equals the size of the cache block accessed, and + the compare values follow from the addresses of the NAPOT memory region + corresponding to the cache block containing the effective address + +* Unless an encoding for a cache block is added to the `mcontrol6.size` field, + an address trigger should only match a memory access from a CBO instruction if + `mcontrol6.size=0` + +_If the Zicbom extension is implemented, this specification recommends the +following additional trigger module behaviors:_ + +* Implementing address match triggers should be optional + +* Type 6 data match triggers, i.e. `tdata1.type=6` and `mcontrol6.select=1`, + should be unsupported + +* Memory accesses are considered to be stores, i.e. an address trigger matches + only if `mcontrol6.store=1` + +_If the Zicboz extension is implemented, this specification recommends the +following additional trigger module behaviors:_ + +* Implementing address match triggers should be mandatory + +* Type 6 data match triggers, i.e. `tdata1.type=6` and `mcontrol6.select=1`, + should be supported, and implementing these triggers should be optional + +* Memory accesses are considered to be stores, i.e. an address trigger matches + only if `mcontrol6.store=1` + +_If the Zicbop extension is implemented, this specification recommends the +following additional trigger module behaviors:_ + +* Implementing address match triggers should be optional + +* Type 6 data match triggers, i.e. `tdata1.type=6` and `mcontrol6.select=1`, + should be unsupported + +* Memory accesses may be considered to be loads or stores depending on the + implementation, i.e. whether an address trigger matches on these instructions + when `mcontrol6.load=1` or `mcontrol6.store=1` is _implementation-specific_ + +_This specification also recommends that the behavior of trigger modules with +respect to the Zicboz extension should be defined in version 1.0 of the debug +architecture specification. The behavior of trigger modules with respect to the +Zicbom and Zicbop extensions is expected to be defined in future extensions._ +==== + +===== Hypervisor Extension + +For the purposes of writing the `mtinst` or `htinst` register on a trap, the +following standard transformation is defined for cache-block management +instructions and cache-block zero instructions: + +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 'opcode'}, + { bits: 5, name: 0x0 }, + { bits: 3, name: 'funct3'}, + { bits: 5, name: 0x0}, + { bits: 12, name: 'operation'}, +]} +.... + +The `operation` field corresponds to the 12 most significant bits of the +trapping instruction. + +[NOTE] +==== +_As described in the hypervisor extension, a zero may be written into `mtinst` +or `htinst` instead of the standard transformation defined above._ +==== + +==== Effects on Constrained LR/SC Loops + +The following event is added to the list of events that satisfy the eventuality +guarantee provided by constrained LR/SC loops, as defined in the A extension: + +* Some other hart executes a cache-block management instruction or a cache-block + zero instruction to the reservation set of the LR instruction in _H_'s + constrained LR/SC loop. + +[NOTE] +==== +_The above event has been added to accommodate cache coherence protocols that +cannot distinguish between invalidations for stores and invalidations for +cache-block management operations._ + +_Aside from the above event, CMO instructions neither change the properties of +constrained LR/SC loops nor modify the eventuality guarantee provided by them. +For example, executing a CMO instruction may cause a constrained LR/SC loop on +any hart to fail periodically or may cause a unconstrained LR/SC sequence on the +same hart to fail always. Additionally, executing a cache-block prefetch +instruction does not impact the eventuality guarantee provided by constrained +LR/SC loops executed on any hart._ +==== + +==== Software Discovery + +The initial set of CMO extensions requires the following information to be +discovered by software: + +* The size of the cache block for management and prefetch instructions +* The size of the cache block for zero instructions +* CBIE support at each privilege level + +Other general cache characteristics may also be specified in the discovery +mechanism. + +[#csr_state,reftext="Control and Status Register State"] +=== Control and Status Register State + +[NOTE] +==== +_The CMO extensions rely on state in {csrname} CSRs that will be defined in a +future update to the privileged architecture. If this CSR update is not +ratified, the CMO extension will define its own CSRs._ +==== + +Three CSRs control the execution of CMO instructions: + +* `m{csrname}` +* `s{csrname}` +* `h{csrname}` + +The `s{csrname}` register is used by all supervisor modes, including VS-mode. A +hypervisor is responsible for saving and restoring `s{csrname}` on guest context +switches. The `h{csrname}` register is only present if the H-extension is +implemented and enabled. + +Each `x{csrname}` register (where `x` is `m`, `s`, or `h`) has the following +generic format: + +.Generic Format for x{csrname} CSRs +[cols="^10,^10,80a"] +|=== +| Bits | Name | Description + +| [5:4] | `CBIE` | Cache Block Invalidate instruction Enable + +Enables the execution of the cache block invalidate instruction, `CBO.INVAL`, in +a lower privilege mode: + +* `00`: The instruction raises an illegal instruction or virtual instruction + exception +* `01`: The instruction is executed and performs a flush operation +* `10`: _Reserved_ +* `11`: The instruction is executed and performs an invalidate operation + +| [6] | `CBCFE` | Cache Block Clean and Flush instruction Enable + +Enables the execution of the cache block clean instruction, `CBO.CLEAN`, and the +cache block flush instruction, `CBO.FLUSH`, in a lower privilege mode: + +* `0`: The instruction raises an illegal instruction or virtual instruction + exception +* `1`: The instruction is executed + +| [7] | `CBZE` | Cache Block Zero instruction Enable + +Enables the execution of the cache block zero instruction, `CBO.ZERO`, in a +lower privilege mode: + +* `0`: The instruction raises an illegal instruction or virtual instruction + exception +* `1`: The instruction is executed + +|=== + +The x{csrname} registers control CBO instruction execution based on the current +privilege mode and the state of the appropriate CSRs, as detailed below. + +A `CBO.INVAL` instruction executes or raises either an illegal instruction +exception or a virtual instruction exception based on the state of the +`x{csrname}.CBIE` fields: + +[source,sail,subs="attributes+"] +-- + +// illegal instruction exceptions +if (((priv_mode != M) && (m{csrname}.CBIE == 00)) || + ((priv_mode == U) && (s{csrname}.CBIE == 00))) +{ + +} +// virtual instruction exceptions +else if (((priv_mode == VS) && (h{csrname}.CBIE == 00)) || + ((priv_mode == VU) && ((h{csrname}.CBIE == 00) || (s{csrname}.CBIE == 00)))) +{ + +} +// execute instruction +else +{ + if (((priv_mode != M) && (m{csrname}.CBIE == 01)) || + ((priv_mode == U) && (s{csrname}.CBIE == 01)) || + ((priv_mode == VS) && (h{csrname}.CBIE == 01)) || + ((priv_mode == VU) && ((h{csrname}.CBIE == 01) || (s{csrname}.CBIE == 01)))) + { + + } + else + { + + } +} + + +-- + +[NOTE] +==== +_Until a modified cache block has updated memory, a `CBO.INVAL` instruction may +expose stale data values in memory if the CSRs are programmed to perform an +invalidate operation. This behavior may result in a security hole if lower +privileged level software performs an invalidate operation and accesses +sensitive information in memory._ + +_To avoid such holes, higher privileged level software must perform either a +clean or flush operation on the cache block before permitting lower privileged +level software to perform an invalidate operation on the block. Alternatively, +higher privileged level software may program the CSRs so that `CBO.INVAL` +either traps or performs a flush operation in a lower privileged level._ +==== + +A `CBO.CLEAN` or `CBO.FLUSH` instruction executes or raises an illegal +instruction or virtual instruction exception based on the state of the +`x{csrname}.CBCFE` bits: + +[source,sail,subs="attributes+"] +-- + +// illegal instruction exceptions +if (((priv_mode != M) && !m{csrname}.CBCFE) || + ((priv_mode == U) && !s{csrname}.CBCFE)) +{ + +} +// virtual instruction exceptions +else if (((priv_mode == VS) && !h{csrname}.CBCFE) || + ((priv_mode == VU) && !(h{csrname}.CBCFE && s{csrname}.CBCFE))) +{ + +} +// execute instruction +else +{ + +} + +-- + +Finally, a `CBO.ZERO` instruction executes or raises an illegal instruction or +virtual instruction exception based on the state of the `x{csrname}.CBZE` bits: + +[source,sail,subs="attributes+"] +-- + +// illegal instruction exceptions +if (((priv_mode != M) && !m{csrname}.CBZE) || + ((priv_mode == U) && !s{csrname}.CBZE)) +{ + +} +// virtual instruction exceptions +else if (((priv_mode == VS) && !h{csrname}.CBZE) || + ((priv_mode == VU) && !(h{csrname}.CBZE && s{csrname}.CBZE))) +{ + +} +// execute instruction +else +{ + +} + +-- + +Each `x{csrname}` register is WARL; however, software should determine the legal +values from the execution environment discovery mechanism. + +[#extensions,reftext="Extensions"] +=== Extensions + +CMO instructions are defined in the following extensions: + +* <<#Zicbom>> +* <<#Zicboz>> +* <<#Zicbop>> + +[#Zicbom,reftext="Cache-Block Management Instructions"] +==== Cache-Block Management Instructions + +Cache-block management instructions enable software running on a set of coherent +agents to communicate with a set of non-coherent agents by performing one of the +following operations: + +* An invalidate operation makes data from store operations performed by a set of + non-coherent agents visible to the set of coherent agents at a point common to + both sets by deallocating all copies of a cache block from the set of coherent + caches up to that point + +* A clean operation makes data from store operations performed by the set of + coherent agents visible to a set of non-coherent agents at a point common to + both sets by performing a write transfer of a copy of a cache block to that + point provided a coherent agent performed a store operation that modified the + data in the cache block since the previous invalidate, clean, or flush + operation on the cache block + +* A flush operation atomically performs a clean operation followed by an + invalidate operation + +In the Zicbom extension, the instructions operate to a point common to _all_ +agents in the system. In other words, an invalidate operation ensures that store +operations from all non-coherent agents visible to agents in the set of coherent +agents, and a clean operation ensures that store operations from coherent agents +visible to all non-coherent agents. + +[NOTE] +==== +_The Zicbom extension does not prohibit agents that fall outside of the above +architectural definition; however, software cannot rely on the defined cache +operations to have the desired effects with respect to those agents._ + +_Future extensions may define different sets of agents for the purposes of +performance optimization._ +==== + +These instructions operate on the cache block whose effective address is +specified in _rs1_. The effective address is translated into a corresponding +physical address by the appropriate translation mechanisms. + +The following instructions comprise the Zicbom extension: + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|✓ +|✓ +|cbo.clean _base_ +|<<#insns-cbo_clean>> + +|✓ +|✓ +|cbo.flush _base_ +|<<#insns-cbo_flush>> + +|✓ +|✓ +|cbo.inval _base_ +|<<#insns-cbo_inval>> + +|=== + +[#Zicboz,reftext="Cache-Block Zero Instructions"] +==== Cache-Block Zero Instructions + +Cache-block zero instructions store zeros to the set of bytes corresponding to a +cache block. An implementation may update the bytes in any order and with any +granularity and atomicity, including individual bytes. + +[NOTE] +==== +_Cache-block zero instructions store zeros independently of whether data from +the underlying memory locations are cacheable. In addition, this specification +does not constrain how the bytes are written._ +==== + +These instructions operate on the cache block, or the memory locations +corresponding to the cache block, whose effective address is specified in _rs1_. +The effective address is translated into a corresponding physical address by the +appropriate translation mechanisms. + +The following instructions comprise the Zicboz extension: + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|✓ +|✓ +|cbo.zero _base_ +|<<#insns-cbo_zero>> + +|=== + +[#Zicbop,reftext="Cache-Block Prefetch Instructions"] +==== Cache-Block Prefetch Instructions + +Cache-block prefetch instructions are HINTs to the hardware to indicate that +software intends to perform a particular type of memory access in the near +future. The types of memory accesses are instruction fetch, data read (i.e. +load), and data write (i.e. store). + +These instructions operate on the cache block whose effective address is the sum +of the base address specified in _rs1_ and the sign-extended offset encoded in +_imm[11:0]_, where _imm[4:0]_ shall equal `0b00000`. The effective address is +translated into a corresponding physical address by the appropriate translation +mechanisms. + +[NOTE] +==== +_Cache-block prefetch instructions are encoded as ORI instructions with rd equal +to `0b00000`; however, for the purposes of effective address calculation, this +field is also interpreted as imm[4:0] like a store instruction._ +==== + +The following instructions comprise the Zicbop extension: + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|✓ +|✓ +|prefetch.i _offset_(_base_) +|<<#insns-prefetch_i>> + +|✓ +|✓ +|prefetch.r _offset_(_base_) +|<<#insns-prefetch_r>> + +|✓ +|✓ +|prefetch.w _offset_(_base_) +|<<#insns-prefetch_w>> + +|=== + +[#insns,reftext="Instructions"] +=== Instructions + +[#insns-cbo_clean,reftext="Cache Block Clean"] +==== cbo.clean + +Synopsis:: +Perform a clean operation on a cache block + +Mnemonic:: +cbo.clean _offset_(_base_) + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0xF, attr: ['MISC-MEM'] }, + { bits: 5, name: 0x0 }, + { bits: 3, name: 0x2, attr: ['CBO'] }, + { bits: 5, name: 'rs1', attr: ['base'] }, + { bits: 12, name: 0x001, attr: ['CBO.CLEAN'] }, +]} +.... + +Description:: + +A *cbo.clean* instruction performs a clean operation on the cache block whose +effective address is the base address specified in _rs1_. The offset operand may +be omitted; otherwise, any expression that computes the offset shall evaluate to +zero. The instruction operates on the set of coherent caches accessed by the +agent executing the instruction. + +Operation:: +[source,sail] +-- +TODO +-- + +[#insns-cbo_flush,reftext="Cache Block Flush"] +==== cbo.flush + +Synopsis:: +Perform a flush operation on a cache block + +Mnemonic:: +cbo.flush _offset_(_base_) + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0xF, attr: ['MISC-MEM'] }, + { bits: 5, name: 0x0 }, + { bits: 3, name: 0x2, attr: ['CBO'] }, + { bits: 5, name: 'rs1', attr: ['base'] }, + { bits: 12, name: 0x002, attr: ['CBO.FLUSH'] }, +]} +.... + +Description:: + +A *cbo.flush* instruction performs a flush operation on the cache block whose +effective address is the base address specified in _rs1_. The offset operand may +be omitted; otherwise, any expression that computes the offset shall evaluate to +zero. The instruction operates on the set of coherent caches accessed by the +agent executing the instruction. + +Operation:: +[source,sail] +-- +TODO +-- + +[#insns-cbo_inval,reftext="Cache Block Invalidate"] +==== cbo.inval + +Synopsis:: +Perform an invalidate operation on a cache block + +Mnemonic:: +cbo.inval _offset_(_base_) + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0xF, attr: ['MISC-MEM'] }, + { bits: 5, name: 0x0 }, + { bits: 3, name: 0x2, attr: ['CBO'] }, + { bits: 5, name: 'rs1', attr: ['base'] }, + { bits: 12, name: 0x000, attr: ['CBO.INVAL'] }, +]} +.... + +Description:: + +A *cbo.inval* instruction performs an invalidate operation on the cache block +whose effective address is the base address specified in _rs1_. The offset +operand may be omitted; otherwise, any expression that computes the offset shall +evaluate to zero. The instruction operates on the set of coherent caches +accessed by the agent executing the instruction. Depending on CSR programming, +the instruction may perform a flush operation instead of an invalidate +operation. + +Operation:: +[source,sail] +-- +TODO +-- + +[#insns-cbo_zero,reftext="Cache Block Zero"] +==== cbo.zero + +Synopsis:: +Store zeros to the full set of bytes corresponding to a cache block + +Mnemonic:: +cbo.zero _offset_(_base_) + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0xF, attr: ['MISC-MEM'] }, + { bits: 5, name: 0x0 }, + { bits: 3, name: 0x2, attr: ['CBO'] }, + { bits: 5, name: 'rs1', attr: ['base'] }, + { bits: 12, name: 0x004, attr: ['CBO.ZERO'] }, +]} +.... + +Description:: + +A *cbo.zero* instruction performs stores of zeros to the full set of bytes +corresponding to the cache block whose effective address is the base address +specified in _rs1_. The offset operand may be omitted; otherwise, any expression +that computes the offset shall evaluate to zero. An implementation may or may +not update the entire set of bytes atomically. + +Operation:: +[source,sail] +-- +TODO +-- + +[#insns-prefetch_i,reftext="Cache Block Prefetch for Instruction Fetch"] +==== prefetch.i + +Synopsis:: +Provide a HINT to hardware that a cache block is likely to be accessed by an +instruction fetch in the near future + +Mnemonic:: +prefetch.i _offset_(_base_) + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 0x0, attr: ['offset[4:0]'] }, + { bits: 3, name: 0x6, attr: ['ORI'] }, + { bits: 5, name: 'rs1', attr: ['base'] }, + { bits: 5, name: 0x0, attr: ['PREFETCH.I'] }, + { bits: 7, name: 'imm[11:5]', attr: ['offset[11:5]'] }, +]} +.... + +Description:: + +A *prefetch.i* instruction indicates to hardware that the cache block whose +effective address is the sum of the base address specified in _rs1_ and the +sign-extended offset encoded in _imm[11:0]_, where _imm[4:0]_ equals `0b00000`, +is likely to be accessed by an instruction fetch in the near future. + +[NOTE] +==== +_An implementation may opt to cache a copy of the cache block in a cache +accessed by an instruction fetch in order to improve memory access latency, but +this behavior is not required._ +==== + +Operation:: +[source,sail] +-- +TODO +-- + +[#insns-prefetch_r,reftext="Cache Block Prefetch for Data Read"] +==== prefetch.r + +Synopsis:: +Provide a HINT to hardware that a cache block is likely to be accessed by a data +read in the near future + +Mnemonic:: +prefetch.r _offset_(_base_) + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 0x0, attr: ['offset[4:0]'] }, + { bits: 3, name: 0x6, attr: ['ORI'] }, + { bits: 5, name: 'rs1', attr: ['base'] }, + { bits: 5, name: 0x1, attr: ['PREFETCH.R'] }, + { bits: 7, name: 'imm[11:5]', attr: ['offset[11:5]'] }, +]} +.... + +Description:: + +A *prefetch.r* instruction indicates to hardware that the cache block whose +effective address is the sum of the base address specified in _rs1_ and the +sign-extended offset encoded in _imm[11:0]_, where _imm[4:0]_ equals `0b00000`, +is likely to be accessed by a data read (i.e. load) in the near future. + +[NOTE] +==== +_An implementation may opt to cache a copy of the cache block in a cache +accessed by a data read in order to improve memory access latency, but this +behavior is not required._ +==== + +Operation:: +[source,sail] +-- +TODO +-- + +[#insns-prefetch_w,reftext="Cache Block Prefetch for Data Write"] +==== prefetch.w + +Synopsis:: +Provide a HINT to hardware that a cache block is likely to be accessed by a data +write in the near future + +Mnemonic:: +prefetch.w _offset_(_base_) + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 0x0, attr: ['offset[4:0]'] }, + { bits: 3, name: 0x6, attr: ['ORI'] }, + { bits: 5, name: 'rs1', attr: ['base'] }, + { bits: 5, name: 0x3, attr: ['PREFETCH.W'] }, + { bits: 7, name: 'imm[11:5]', attr: ['offset[11:5]'] }, +]} +.... + +Description:: + +A *prefetch.w* instruction indicates to hardware that the cache block whose +effective address is the sum of the base address specified in _rs1_ and the +sign-extended offset encoded in _imm[11:0]_, where _imm[4:0]_ equals `0b00000`, +is likely to be accessed by a data write (i.e. store) in the near future. + +[NOTE] +==== +_An implementation may opt to cache a copy of the cache block in a cache +accessed by a data write in order to improve memory access latency, but this +behavior is not required._ +==== + +Operation:: +[source,sail] +-- +TODO +-- + diff --git a/src/riscv-unprivileged.adoc b/src/riscv-unprivileged.adoc index de69a6b..dcfdf47 100644 --- a/src/riscv-unprivileged.adoc +++ b/src/riscv-unprivileged.adoc @@ -47,6 +47,7 @@ endif::[] :hide-uri-scheme: :stem: latexmath :footnote: +:csrname: envcfg _Contributors to all versions of the spec in alphabetical order (please contact editors to suggest corrections): Arvind, Krste Asanović, Rimas Avižienis, Jacob Bachmeyer, Christopher F. Batten, @@ -125,6 +126,7 @@ include::zfa.adoc[] //zfa.tex include::ztso-st-ext.adoc[] //ztso.tex +include::cmo.adoc[] include::rv-32-64g.adoc[] //gmaps.tex include::extending.adoc[] -- cgit v1.1 From e76ff4e35631c400a80b44d63d39d7b57356ab58 Mon Sep 17 00:00:00 2001 From: wmat Date: Thu, 15 Feb 2024 13:30:03 -0500 Subject: Move contributors to top level & remove Introduction heading. Moved all contributors that did not already exist at the top level of the spec to that section. Removed the contributors section from the zawrs chapter. Changed the ordered list under Operation section to asciidoc. --- src/riscv-unprivileged.adoc | 14 +++++--------- src/zawrs.adoc | 18 +++++------------- 2 files changed, 10 insertions(+), 22 deletions(-) diff --git a/src/riscv-unprivileged.adoc b/src/riscv-unprivileged.adoc index 21db1b5..dce6172 100644 --- a/src/riscv-unprivileged.adoc +++ b/src/riscv-unprivileged.adoc @@ -51,16 +51,12 @@ endif::[] _Contributors to all versions of the spec in alphabetical order (please contact editors to suggest corrections): Arvind, Krste Asanović, Rimas Avižienis, Jacob Bachmeyer, Christopher F. Batten, -Allen J. Baum, Alex Bradbury, Scott Beamer, Preston Briggs, Christopher Celio, Chuanhua -Chang, David Chisnall, Paul Clayton, Palmer Dabbelt, Ken Dockser, Roger Espasa, Greg Favor, -Shaked Flur, Stefan Freudenberger, Marc Gauthier, Andy Glew, Jan Gray, Michael Hamburg, John +Allen J. Baum, Abel Bernabeu, Alex Bradbury, Scott Beamer, Preston Briggs, Christopher Celio, Chuanhua +Chang, David Chisnall, Paul Clayton, Palmer Dabbelt, Ken Dockser, Paul Donahue, Aaron Durbin, Roger Espasa, Greg Favor, Shaked Flur, Stefan Freudenberger, Marc Gauthier, Andy Glew, Jan Gray, Michael Hamburg, John Hauser, David Horner, Bruce Hoult, Bill Huffman, Alexandre Joannou, Olof Johansson, Ben Keller, -David Kruckemyer, Yunsup Lee, Paul Loewenstein, Daniel Lustig, Yatin Manerkar, Luc Maranget, -Margaret Martonosi, Joseph Myers, Vijayanand Nagarajan, Rishiyur Nikhil, Jonas Oberhauser, -Stefan O'Rear, Albert Ou, John Ousterhout, David Patterson, Christopher Pulte, Jose Renau, -Josh Scheid, Colin Schmidt, Peter Sewell, Susmit Sarkar, Michael Taylor, Wesley Terpstra, Matt -Thomas, Tommy Thorn, Caroline Trippel, Ray VanDeWalker, Muralidaran Vijayaraghavan, Megan -Wachs, Andrew Waterman, Robert Watson, Derek Williams, Andrew Wright, Reinoud Zandijk, +David Kruckemyer, Tariq Kurd, Yunsup Lee, Paul Loewenstein, Daniel Lustig, Yatin Manerkar, Luc Maranget, +Margaret Martonosi, Phil McCoy, Christoph Müllner, Joseph Myers, Vijayanand Nagarajan, Rishiyur Nikhil, Jonas Oberhauser, Stefan O'Rear, Albert Ou, John Ousterhout, David Patterson, Christopher Pulte, Jose Renau, +Josh Scheid, Colin Schmidt, Peter Sewell, Susmit Sarkar, Ved Shanbhogue, Michael Taylor, Wesley Terpstra, Matt Thomas, Tommy Thorn, Philipp Tomsich, Caroline Trippel, Ray VanDeWalker, Muralidaran Vijayaraghavan, Megan Wachs, Andrew Waterman, Robert Watson, David Weaver, Derek Williams, Andrew Wright, Reinoud Zandijk, and Sizhuo Zhang._ _This document is released under a Creative Commons Attribution 4.0 International License._ diff --git a/src/zawrs.adoc b/src/zawrs.adoc index b47be88..d2858a4 100644 --- a/src/zawrs.adoc +++ b/src/zawrs.adoc @@ -1,12 +1,5 @@ == RISC-V Wait-on-Reservation-Set (Zawrs) extension -=== Contributors - -This RISC-V specification has been contributed to directly or indirectly by: - -Aaron Durbin, Abel Bernabeu, Allen Baum, Christoph Müllner, David Weaver, Greg Favor, Josh Scheid, Ken Dockser, Paul Donahue, Phil McCoy, Philipp Tomsich, Tariq Kurd, Ved Shanbhogue - -=== Introduction The Zawrs extension defines a pair of instructions to be used in polling loops that allows a core to enter a low-power state and wait on a store to a memory location. Waiting for a memory location to be updated is a common pattern in @@ -71,13 +64,12 @@ supported in a constrained `LR`/`SC` loop. .... *Operation:* -[source,asciidoc, linenums] -.... + Hart execution may be stalled while the following conditions are all satisfied: - a) The reservation set is valid - b) If `WRS.STO`, a "short" duration since start of stall has not elapsed - c) No pending interrupt is observed (see the rules below) -.... +[loweralpha] + . The reservation set is valid + . If `WRS.STO`, a "short" duration since start of stall has not elapsed + . No pending interrupt is observed (see the rules below) While stalled, an implementation is permitted to occasionally terminate the stall and complete execution for any reason. -- cgit v1.1 From 7d7e9abb39c2cfa10f3536378c177a2bfbc87773 Mon Sep 17 00:00:00 2001 From: wmat Date: Thu, 15 Feb 2024 15:03:42 -0500 Subject: Adding Sscofpmf to priv spec. Added sscofpmf.adoc file and included in priv spec. Updated referenced figures to the asciidoc versions. --- src/riscv-privileged.adoc | 1 + 1 file changed, 1 insertion(+) diff --git a/src/riscv-privileged.adoc b/src/riscv-privileged.adoc index 410aeab..086a69b 100644 --- a/src/riscv-privileged.adoc +++ b/src/riscv-privileged.adoc @@ -86,6 +86,7 @@ include::machine.adoc[] include::rnmi.adoc[] //supervisor.tex include::supervisor.adoc[] +include::sscofpmt.adoc[] //hypervisor.tex include::hypervisor.adoc[] //priv-insns.tex -- cgit v1.1 From 78199bdc59816392a84dfb967339c1212d57bf1c Mon Sep 17 00:00:00 2001 From: wmat Date: Thu, 15 Feb 2024 15:19:10 -0500 Subject: Added sscofpmt.adoc to git. Forgot to git add sscofpmt.adoc. --- src/sscofpmt.adoc | 191 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 191 insertions(+) create mode 100644 src/sscofpmt.adoc diff --git a/src/sscofpmt.adoc b/src/sscofpmt.adoc new file mode 100644 index 0000000..e5a8bdd --- /dev/null +++ b/src/sscofpmt.adoc @@ -0,0 +1,191 @@ +[[Sscofpmf]] +== "Sscofpmf" Count Overflow and Mode-Based Filtering Extension + +The current Privileged specification defines mhpmevent CSRs to select and +control event counting by the associated hpmcounter CSRs, but provides no +standardization of any fields within these CSRs. For at least Linux-class +rich-OS systems it is desirable to standardize certain basic features that are +broadly desired (and have come up over the past year plus on RISC-V lists, as +well as have been the subject of past proposals). This enables there to be +standard upstream software support that eliminates the need for implementations +to provide their own custom software support. + +This extension serves to accomplish exactly this within the existing mhpmevent +CSRs (and correspondingly avoids the unnecessary creation of whole new sets of +CSRs - past just one new CSR). + +This extension sticks to addressing two basic well-understood needs that have +been requested by various people. To make it easy to understand the deltas from +the current Priv 1.11/1.12 specs, this is written as the actual exact changes +to be made to existing paragraphs of Priv spec text (or additional paragraphs +within the existing text). + +The extension name is "Sscofpmf" ('Ss' for Privileged arch and Supervisor-level +extensions, and 'cofpmf' for Count OverFlow and Privilege Mode Filtering). + +Note that the new count overflow interrupt will be treated as a standard local +interrupt that is assigned to bit 13 in the mip/mie/sip/sie registers. + +=== Machine Level Additions + +==== Hardware Performance Monitor + +This extension expands the hardware performance monitor description and extends +the mhpmevent registers to 64 bits (in RV32) as follows: + +The hardware performance monitor includes 29 additional 64-bit event counters +and 29 associated 64-bit event selector registers - the +mhpmcounter3–mhpmcounter31 and mhpmevent3–mhpmevent31 CSRs. + +The mhpmcounters are WARL registers that support up to 64 bits of precision on +RV32 and RV64. + +The mhpmevent__n__ registers are WARL registers that control which event causes +the corresponding counter to increment and what happens when the corresponding +count overflows. Currently just a few bits are defined here. Past this, the +actual selection and meaning of events is defined by the platform, but +(mhpmevent == 0) is defined to mean “no event" and that the corresponding +counter will never be incremented. Typically the lower bits of mhpmevent will +be used for event selection purposes. + +On RV32 only, accesses to the mcycle, minstret, mhpmcounter__n__, and +mhpmevent__n__ CSRs access the low 32 bits, while accesses to the mcycleh, +minstreth, mhpmcounter__n__h, and mhpmevent__n__h CSRs access bits 63–32 of the +corresponding counter or event selector. The proposed CSR numbers for +mhpmevent__n__h are 0x723 - 0x73F. + +The following bits are added to mhpmevent: + +bit [63] +++OF+++ - Overflow status and interrupt disable bit that is set when counter overflows + +bit [62] +++MINH+++ - If set, then counting of events in M-mode is inhibited + +bit [61] +++SINH+++ - If set, then counting of events in S/HS-mode is inhibited + +bit [60] +++UINH+++ - If set, then counting of events in U-mode is inhibited + +bit [59] +++VSINH+++ - If set, then counting of events in VS-mode is inhibited + +bit [58] +++VUINH+++ - If set, then counting of events in VU-mode is inhibited + +bit [57] 0 - Reserved for possible future modes + +bit [56] 0 - Reserved for possible future modes + +Each of the five `x`INH bits, when set, inhibit counting of events while in +privilege mode `x`. All-zeroes for these bits results in counting of events in +all modes. + +The OF bit is set when the corresponding hpmcounter overflows, and remains set +until written by software. Since hpmcounter values are unsigned values, +overflow is defined as unsigned overflow of the implemented counter bits. Note +that there is no loss of information after an overflow since the counter wraps +around and keeps counting while the sticky OF bit remains set. + +If supervisor mode is implemented, the 32-bit scountovf register contains +read-only shadow copies of the OF bits in all 32 mhpmevent registers. + +If an hpmcounter overflows while the associated OF bit is zero, then a "count +overflow interrupt request" is generated. If the OF bit is one, then no +interrupt request is generated. Consequently the OF bit also functions as a +count overflow interrupt disable for the associated hpmcounter. + +Count overflow never results from writes to the mhpmcounter__n__ or +mhpmevent__n__ registers, only from hardware increments of counter registers. + +This "count overflow interrupt request" signal is treated as a standard local +interrupt that corresponds to bit 13 in the mip/mie/sip/sie registers. The +mip/sip LCOFIP and mie/sie LCOFIE bits are respectively the interrupt-pending +and interrupt-enable bits for this interrupt. ('LCOFI' represents 'Local Count +Overflow Interrupt'.) + +Generation of a "count overflow interrupt request" by an hpmcounter sets the +LCOFIP bit in the mip/sip registers and sets the associated OF bit. The mideleg +register controls the delegation of this interrupt to S-mode versus M-mode. The +LCOFIP bit is cleared by software before servicing the count overflow interrupt +resulting from one or more count overflows. + +[NOTE] +.Non-normative +==== +There are not separate overflow status and overflow interrupt enable bits. In +practice, enabling overflow interrupt generation (by clearing the OF bit) is +done in conjunction with initializing the counter to a starting value. Once a +counter has overflowed, it and the OF bit must be reinitialized before another +overflow interrupt can be generated. +==== + +[NOTE] +.Non-normative +==== +Software can distinguish newly overflowed counters (yet to be serviced by an +overflow interrupt handler) from overflowed counters that have already been +serviced or that are configured to not generate an interrupt on overflow, by +maintaining a bit mask reflecting which counters are active and due to +eventually overflow. +==== + +==== Machine Interrupt Registers (mip and mie) + +This extension adds the description of the LCOFIP/LCOFIE bits in these +registers (and modifies related text) as follows: + +LCOFIP is added to mip in <> as bit 13. LCOFIP is added to mie in +<> as bit 13. + +If the Sscofpmf extension is implemented, bits mip.LCOFIP and mie.LCOFIE are +the interrupt-pending and interrupt-enable bits for local count overflow +interrupts. LCOFIP is read-write in mip and reflects the occurrence of a local +count overflow interrupt request resulting from any of the mhpmevent__n__.OF +bits being set. If the Sscofpmf extension is not implemented, these LCOFIP and +LCOFIE bits are hardwired to zeros. + +Multiple simultaneous interrupts destined for different privilege modes are +handled in decreasing order of destined privilege mode. Multiple simultaneous +interrupts destined for the same privilege mode are handled in the following +decreasing priority order: MEI, MSI, MTI, SEI, SSI, STI, LCOFI. + +=== Supervisor Level Additions + +==== Supervisor Interrupt Registers (sip and sie) + +This extension adds the description of the LCOFIP/LCOFIE bits in these +registers (and modifies related text) as follows: + +LCOFIP is added to sip in <> as bit 13. LCOFIP is added to sie in +<> as bit 13. + +If the Sscofpmf extension is implemented, bits sip.LCOFIP and sie.LCOFIE are +the interrupt-pending and interrupt-enable bits for local count overflow +interrupts. LCOFIP is read-write in sip and reflects the occurrence of a local +count overflow interrupt request resulting from any of the mhpmevent__n__.OF +bits being set. If the Sscofpmf extension is not implemented, these LCOFIP and +LCOFIE bits are hardwired to zeros. + +Each standard interrupt type (LCOFI, SEI, STI, or SSI) may not be implemented, +in which case the corresponding interrupt-pending and interrupt-enable bits are +hardwired to zeros. All bits in sip and sie are WARL fields. + +Multiple simultaneous interrupts destined for supervisor mode are handled in +the following decreasing priority order: SEI, SSI, STI, LCOFI. + +==== Supervisor Count Overflow (scountovf) + +This extension adds this new CSR. + +The scountovf CSR is a 32-bit read-only register that contains shadow copies of +the OF bits in the 29 mhpmevent CSRs (mhpmevent__3__ - mhpmevent__31__) - where +scountovf bit _X_ corresponds to mhpmevent__X__. The proposed CSR number is +0xDA0. + +This register enables supervisor-level overflow interrupt handler software to +quickly and easily determine which counter(s) have overflowed (without needing +to make an execution environment call or series of calls ultimately up to +M-mode). + +Read access to bit _X_ is subject to the same mcounteren (or mcounteren and +hcounteren) CSRs that mediate access to the hpmcounter CSRs by S-mode (or +VS-mode). In M and S modes, scountovf bit _X_ is readable when mcounteren bit +_X_ is set, and otherwise reads as zero. Similarly, in VS mode, scountovf bit +_X_ is readable when mcounteren bit _X_ and hcounteren bit _X_ are both set, +and otherwise reads as zero. \ No newline at end of file -- cgit v1.1 From 1a06203cdba795bfedea989f7c166eca0ade829a Mon Sep 17 00:00:00 2001 From: wmat Date: Thu, 15 Feb 2024 15:44:09 -0500 Subject: Add Sstc.adoc to privileged spec. This integrates the stimecmp/vstimecmp chapter into the privileged spec. --- src/riscv-privileged.adoc | 1 + src/sstc.adoc | 190 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 191 insertions(+) create mode 100644 src/sstc.adoc diff --git a/src/riscv-privileged.adoc b/src/riscv-privileged.adoc index 410aeab..94a6929 100644 --- a/src/riscv-privileged.adoc +++ b/src/riscv-privileged.adoc @@ -88,6 +88,7 @@ include::rnmi.adoc[] include::supervisor.adoc[] //hypervisor.tex include::hypervisor.adoc[] +include::sstc.adoc[] //priv-insns.tex include::priv-insns.adoc[] //priv-history.tex diff --git a/src/sstc.adoc b/src/sstc.adoc new file mode 100644 index 0000000..82e10fc --- /dev/null +++ b/src/sstc.adoc @@ -0,0 +1,190 @@ +[[Sstc]] +== "Stimecpm/Vstimecmp" Extension + +The current Privileged arch specification only defines a hardware mechanism for +generating machine-mode timer interrupts (based on the mtime and mtimecmp +registers). With the resultant requirement that timer services for +S-mode/HS-mode (and for VS-mode) have to all be provided by M-mode - via SBI +calls from S/HS-mode up to M-mode (or VS-mode calls to HS-mode and then to +M-mode). M-mode software then multiplexes these multiple logical timers onto +its one physical M-mode timer facility, and the M-mode timer interrupt handler +passes timer interrupts back down to the appropriate lower privilege mode. + +This extension serves to provide supervisor mode with its own CSR-based timer +interrupt facility that it can directly manage to provide its own timer service +(in the form of having its own stimecmp register) - thus eliminating the large +overheads for emulating S/HS-mode timers and timer interrupt generation up in +M-mode. Further, this extension adds a similar facility to the Hypervisor +extension for VS-mode. + +To make it easy to understand the deltas from the current Priv 1.11/1.12 specs, +this is written as the actual exact changes to be made to existing paragraphs +of Priv spec text (or additional paragraphs within the existing text). + +The extension name is "Sstc" ('Ss' for Privileged arch and Supervisor-level +extensions, and 'tc' for timecmp). This extension adds the S-level stimecmp CSR +and the VS-level vstimecmp CSR. + +=== Machine and Supervisor Level Additions + +==== *Supervisor Timer Register (stimecmp)* + +This extension adds this new CSR. + +The stimecmp CSR is a 64-bit register and has 64-bit precision on all RV32 and +RV64 systems. In RV32 only, accesses to the stimecmp CSR access the low 32 +bits, while accesses to the stimecmph CSR access the high 32 bits of stimecmp. + +The CSR numbers for stimecmp / stimecmph are 0x14D / 0x15D (within the +Supervisor Trap Setup block of CSRs). + +A supervisor timer interrupt becomes pending - as reflected in the STIP bit in +the mip and sip registers - whenever time contains a value greater than or +equal to stimecmp, treating the values as unsigned integers. Writes to stimecmp +are guaranteed to be reflected in STIP eventually, but not necessarily +immediately. The interrupt remains posted until stimecmp becomes greater than +time - typically as a result of writing stimecmp. The interrupt will be taken +based on the standard interrupt enable and delegation rules. + +[NOTE] +.Non-normative +==== +A spurious timer interrupt might occur if an interrupt handler advances +stimecmp then immediately returns, because STIP might not yet have fallen in +the interim. All software should be written to assume this event is possible, +but most software should assume this event is extremely unlikely. It is almost +always more performant to incur an occasional spurious timer interrupt than to +poll STIP until it falls. +==== + +[NOTE] +.Non-normative +==== +In systems in which a supervisor execution environment (SEE) provides timer +facilities via an SBI function call, this SBI call will continue to support +requests to schedule a timer interrupt. The SEE will simply make use of +stimecmp, changing its value as appropriate. This ensures compatibility with +existing S-mode software that uses this SEE facility, while new S-mode software +takes advantage of stimecmp directly.) +==== + +==== Machine Interrupt Registers (mip and mie) + +This extension modifies the description of the STIP/STIE bits in these +registers as follows: + +If supervisor mode is implemented, its mip.STIP and mie.STIE are the +interrupt-pending and interrupt-enable bits for supervisor-level timer +interrupts. If the stimecmp register is not implemented, STIP is writable in +mip, and may be written by M-mode software to deliver timer interrupts to +S-mode. If the stimecmp (supervisor-mode timer compare) register is +implemented, STIP is read-only in mip and reflects the supervisor-level timer +interrupt signal resulting from stimecmp. This timer interrupt signal is +cleared by writing stimecmp with a value greater than the current time value. + +==== Supervisor Interrupt Registers (sip and sie) + +This extension modifies the description of the STIP/STIE bits in these +registers as follows: + +Bits sip.STIP and sie.STIE are the interrupt-pending and interrupt-enable bits +for supervisor level timer interrupts. If implemented, STIP is read-only in +sip, and is either set and cleared by the execution environment (if stimecmp is +not implemented), or reflects the timer interrupt signal resulting from +stimecmp (if stimecmp is implemented). The sip.STIP bit, in response to timer +interrupts generated by stimecmp, is set and cleared by writing stimecmp with a +value that respectively is less than or equal to, or greater than, the current +time value. + +==== Machine Counter-Enable Register (mcounteren) + +This extension adds to the description of the TM bit in this register as +follows: + +In addition, when the TM bit in the mcounteren register is clear, attempts to +access the stimecmp or vstimecmp register while executing in a mode less +privileged than M will cause an illegal instruction exception. When this bit +is set, access to the stimecmp or vstimecmp register is permitted in S-mode if +implemented, and access to the vstimecmp register (via stimecmp) is permitted +in VS-mode if implemented and not otherwise prevented by the TM bit in +hcounteren. + +=== Hypervisor Extension Additions + +==== *Virtual Supervisor Timer Register (vstimecmp)* + +This extension adds this new CSR. + +The vstimecmp CSR is a 64-bit register and has 64-bit precision on all RV32 and +RV64 systems. In RV32 only, accesses to the vstimecmp CSR access the low 32 +bits, while accesses to the vstimecmph CSR access the high 32 bits of +vstimecmp. + +The proposed CSR numbers for vstimecmp / vstimecmph are 0x24D / 0x25D (within +the Virtual Supervisor Registers block of CSRs, and mirroring the CSR numbers +for stimecmp/stimecmph). + +A virtual supervisor timer interrupt becomes pending - as reflected in the +VSTIP bit in the hip register - whenever (time + htimedelta), truncated to 64 +bits, contains a value greater than or equal to vstimecmp, treating the values +as unsigned integers. Writes to vstimecmp and htimedelta are guaranteed to be +reflected in VSTIP eventually, but not necessarily immediately. The interrupt +remains posted until vstimecmp becomes greater than (time + htimedelta) - +typically as a result of writing vstimecmp. The interrupt will be taken based +on the standard interrupt enable and delegation rules while V=1. + +[NOTE] +.Non-normative +==== +In systems in which a supervisor execution environment (SEE) implemented by an +HS-mode hypervisor provides timer facilities via an SBI function call, this SBI +call will continue to support requests to schedule a timer interrupt. The SEE +will simply make use of vstimecmp, changing its value as appropriate. This +ensures compatibility with existing guest VS-mode software that uses this SEE +facility, while new VS-mode software takes advantage of vstimecmp directly.) +==== + +==== Hypervisor Interrupt Registers (hvip, hip, and hie) + +This extension modifies the description of the VSTIP/VSTIE bits in the hip/hie +registers as follows: + +Bits hip.VSTIP and hie.VSTIE are the interrupt-pending and interrupt-enable +bits for VS-level timer interrupts. VSTIP is read-only in hip, and is the +logical-OR of hvip.VSTIP and the timer interrupt signal resulting from +vstimecmp (if vstimecmp is implemented). The hip.VSTIP bit, in response to +timer interrupts generated by vstimecmp, is set and cleared by writing +vstimecmp with a value that respectively is less than or equal to, or greater +than, the current (time + htimedelta) value. The hip.VSTIP bit remains defined +while V=0 as well as V=1. + +==== Hypervisor Counter-Enable Register (hcounteren) + +This extension adds to the description of the TM bit in this register as +follows: + +In addition, when the TM bit in the hcounteren register is clear, attempts to +access the vstimecmp register (via stimecmp) while executing in VS-mode will +cause a virtual instruction exception if the same bit in mcounteren is set. +When this bit and the same bit in mcounteren are both set, access to the +vstimecmp register (if implemented) is permitted in VS-mode. + +=== Environment Config (menvcfg/henvcfg) Support + +Enable/disable bits for this extension are provided in the new menvcfg / +henvcfg CSRs. + +Bit 63 of menvcfg (or bit 31 of menvcfgh) - named STCE (STimecmp Enable) - +enables stimecmp for S-mode when set to one, and the same bit of henvcfg +enables vstimecmp for VS-mode. These STCE bits are WARL and are hard-wired to 0 +when this extension is not implemented. + +When STCE in menvcfg is zero, an attempt to access stimecmp or vstimecmp in a +mode other than M-mode raises an illegal instruction exception, STCE in henvcfg +is read-only zero, and STIP in mip and sip reverts to its defined behavior as +if this extension is not implemented. + +When STCE in menvcfg is one but STCE in henvcfg is zero, an attempt to access +stimecmp (really vstimecmp) when V = 1 raises a virtual instruction exception, +and VSTIP in hip reverts to its defined behavior as if this extension is not +implemented. \ No newline at end of file -- cgit v1.1 From c608a9fe75a430226fd0763cfc121c3199c705f1 Mon Sep 17 00:00:00 2001 From: wmat Date: Tue, 20 Feb 2024 10:40:01 -0500 Subject: Fixing titles in zawrs chapter. Fixing titles in zawrs chapter. --- src/zawrs.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/zawrs.adoc b/src/zawrs.adoc index d2858a4..262949c 100644 --- a/src/zawrs.adoc +++ b/src/zawrs.adoc @@ -1,4 +1,4 @@ -== RISC-V Wait-on-Reservation-Set (Zawrs) extension +== "Zawrs" Statndard extension for Wait-on-Reservation-Set instructions The Zawrs extension defines a pair of instructions to be used in polling loops that allows a core to enter a low-power state and wait on a store to a memory @@ -42,7 +42,7 @@ to be provided by a narrower Zalrsc extension in the future. ==== [[Zawrs]] -=== Zawrs +=== Wait-on-Reservation-Set Instructions The `WRS.NTO` and `WRS.STO` instructions cause the hart to temporarily stall execution in a low-power state as long as the reservation set is valid and no -- cgit v1.1 From 040e76a96ca17aeacc28878cfadae16e8d0e0026 Mon Sep 17 00:00:00 2001 From: wmat Date: Tue, 20 Feb 2024 10:52:37 -0500 Subject: Added version 1.01 to zawrs title. Added version 1.01 to zawrs title. --- src/zawrs.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/zawrs.adoc b/src/zawrs.adoc index 262949c..79750d3 100644 --- a/src/zawrs.adoc +++ b/src/zawrs.adoc @@ -1,4 +1,4 @@ -== "Zawrs" Statndard extension for Wait-on-Reservation-Set instructions +== "Zawrs" Statndard extension for Wait-on-Reservation-Set instructions, version 1.01 The Zawrs extension defines a pair of instructions to be used in polling loops that allows a core to enter a low-power state and wait on a store to a memory -- cgit v1.1 From 9ffc12a12281c93dadfa446eb4fd42decbd0a9b3 Mon Sep 17 00:00:00 2001 From: wmat Date: Tue, 20 Feb 2024 11:21:27 -0500 Subject: Removed Encoding and Operation titles. To make chapter look like the rest of the document, remvoing "Encoding" and "Operation" titles. --- src/zawrs.adoc | 3 --- 1 file changed, 3 deletions(-) diff --git a/src/zawrs.adoc b/src/zawrs.adoc index 79750d3..7443c41 100644 --- a/src/zawrs.adoc +++ b/src/zawrs.adoc @@ -51,7 +51,6 @@ duration is bounded by an implementation defined short timeout. These instructions are available in all privilege modes. These instructions are not supported in a constrained `LR`/`SC` loop. -*Encoding:* [wavedrom, ,svg] .... {reg: [ @@ -63,8 +62,6 @@ supported in a constrained `LR`/`SC` loop. ], config:{lanes: 1, hspace:1024}} .... -*Operation:* - Hart execution may be stalled while the following conditions are all satisfied: [loweralpha] . The reservation set is valid -- cgit v1.1 From d504d5e2ba3ec3c88cabab0c0688200e221824ca Mon Sep 17 00:00:00 2001 From: wmat Date: Tue, 20 Feb 2024 11:37:00 -0500 Subject: Adding zawrs table to instructions chapter. Adding zawrs table to the instructions chapter. --- src/rv-32-64g.adoc | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/src/rv-32-64g.adoc b/src/rv-32-64g.adoc index 7714436..1818ddf 100644 --- a/src/rv-32-64g.adoc +++ b/src/rv-32-64g.adoc @@ -442,6 +442,15 @@ ISA. 2+|1101010 |00011 |rs1 |rm |rd |1010011 |FCVT.H.LU |=== +[%autowidth.stretch,float="center",align="center",cols="^2m,^2m,^2m,^2m,<2m,>3m, <4m, >4m, <4m, >4m, <4m, >4m, <4m, >4m, <6m"] +|=== +15+^|Zawrs Standard Extension + +6+^|000000001101 2+^|00000 2+^|000 2+^|00000 2+^|1110011 <|WRS.NTO +6+^|000000011101 2+^|00000 2+^|000 2+^|00000 2+^|1110011 <|WRS.STO +|=== + + <> lists the CSRs that have currently been allocated CSR addresses. The timers, counters, and floating-point CSRs are the only CSRs defined in this specification. -- cgit v1.1 From ef1892c63b3caed42c444c125e0a3b8962bb28ea Mon Sep 17 00:00:00 2001 From: wmat Date: Tue, 20 Feb 2024 12:46:42 -0500 Subject: Fixed for zawrs chapter. Capitalized V in version. Added pagebreak to keep bullets on same page. Edited Note to singularize pluralized words. --- src/zawrs.adoc | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/zawrs.adoc b/src/zawrs.adoc index 7443c41..456c582 100644 --- a/src/zawrs.adoc +++ b/src/zawrs.adoc @@ -1,4 +1,4 @@ -== "Zawrs" Statndard extension for Wait-on-Reservation-Set instructions, version 1.01 +== "Zawrs" Statndard extension for Wait-on-Reservation-Set instructions, Version 1.01 The Zawrs extension defines a pair of instructions to be used in polling loops that allows a core to enter a low-power state and wait on a store to a memory @@ -37,10 +37,9 @@ reached. [NOTE] ==== The instructions in the Zawrs extension are only useful in conjunction with the -LR instructions, which are provided by the A extension, and which we also expect +LR instruction, which is provided by the A extension, and which we also expect to be provided by a narrower Zalrsc extension in the future. ==== - [[Zawrs]] === Wait-on-Reservation-Set Instructions @@ -62,6 +61,8 @@ supported in a constrained `LR`/`SC` loop. ], config:{lanes: 1, hspace:1024}} .... +<<< + Hart execution may be stalled while the following conditions are all satisfied: [loweralpha] . The reservation set is valid -- cgit v1.1 From 7dbf812571c68dece8f4e28f71759fc974fe8648 Mon Sep 17 00:00:00 2001 From: wmat Date: Thu, 22 Feb 2024 10:41:05 -0500 Subject: Refactor Zc chapter into 1 file. Replaced all of the included content with it's actual content. Reformatted text to match rest of spec. Move zc.adoc to src file with all other chapters. --- src/riscv-unprivileged.adoc | 2 +- src/zc.adoc | 2616 +++++++++++++++++++++++++++++++++++++++++++ src/zc/Zc.adoc | 396 ------- 3 files changed, 2617 insertions(+), 397 deletions(-) create mode 100644 src/zc.adoc delete mode 100644 src/zc/Zc.adoc diff --git a/src/riscv-unprivileged.adoc b/src/riscv-unprivileged.adoc index 1ff9228..211267c 100644 --- a/src/riscv-unprivileged.adoc +++ b/src/riscv-unprivileged.adoc @@ -123,7 +123,7 @@ include::ztso-st-ext.adoc[] //ztso.tex include::zawrs.adoc[] -include::zc/Zc.adoc[] +include::zc.adoc[] include::rv-32-64g.adoc[] //gmaps.tex diff --git a/src/zc.adoc b/src/zc.adoc new file mode 100644 index 0000000..28fc904 --- /dev/null +++ b/src/zc.adoc @@ -0,0 +1,2616 @@ +[#Zc] +== "Zc*" Standard Extension for Code Size Reduction + +=== Zc* Overview + +Zc* is a group of extensions that define subsets of the existing C extension (Zca, Zcd, Zcf) and new extensions which only contain 16-bit encodings. + +Zcm* all reuse the encodings for _c.fld_, _c.fsd_, _c.fldsp_, _c.fsdsp_. + +.Zc* extension overview +[width="100%",options=header,cols="3,1,1,1,1,1,1"] +|==================================================================================== +|Instruction |Zca |Zcf |Zcd |Zcb |Zcmp |Zcmt +7+|*The Zca extension is added as way to refer to instructions in the C extension that do not include the floating-point loads and stores* +|C excl. c.f* |yes | | | | | +7+|*The Zcf extension is added as a way to refer to compressed single-precision floating-point load/stores* +|c.flw | |rv32 | | | | +|c.flwsp | |rv32 | | | | +|c.fsw | |rv32 | | | | +|c.fswsp | |rv32 | | | | +7+|*The Zcd extension is added as a way to refer to compressed double-precision floating-point load/stores* +|c.fld | | |yes | | | +|c.fldsp | | |yes | | | +|c.fsd | | |yes | | | +|c.fsdsp | | |yes | | | +7+|*Simple operations for use on all architectures* +|c.lbu | | | |yes | | +|c.lh | | | |yes | | +|c.lhu | | | |yes | | +|c.sb | | | |yes | | +|c.sh | | | |yes | | +|c.zext.b | | | |yes | | +|c.sext.b | | | |yes | | +|c.zext.h | | | |yes | | +|c.sext.h | | | |yes | | +|c.zext.w | | | |yes | | +|c.mul | | | |yes | | +|c.not | | | |yes | | +7+|*PUSH/POP and double move which overlap with _c.fsdsp_. Complex operations intended for embedded CPUs* +|cm.push | | | | |yes | +|cm.pop | | | | |yes | +|cm.popret | | | | |yes | +|cm.popretz | | | | |yes | +|cm.mva01s | | | | |yes | +|cm.mvsa01 | | | | |yes | +7+|*Table jump which overlaps with _c.fsdsp_. Complex operations intended for embedded CPUs* +|cm.jt | | | | | |yes +|cm.jalt | | | | | |yes +|==================================================================================== + +[#C] +=== C + +The C extension is the superset of the following extensions: + +* Zca +* Zcf if F is specified (RV32 only) +* Zcd if D is specified + +As C defines the same instructions as Zca, Zcf and Zcd, the rule is that: + +* C always implies Zca +* C+F implies Zcf (RV32 only) +* C+D implies Zcd + +[#Zce] +=== Zce + +The Zce extension is intended to be used for microcontrollers, and includes all relevant Zc extensions. + +* Specifying Zce on RV32 without F includes Zca, Zcb, Zcmp, Zcmt +* Specifying Zce on RV32 with F includes Zca, Zcb, Zcmp, Zcmt _and_ Zcf +* Specifying Zce on RV64 always includes Zca, Zcb, Zcmp, Zcmt +** Zcf doesn't exist for RV64 + +Therefore common ISA strings can be updated as follows to include the relevant Zc extensions, for example: + +* RV32IMC becomes RV32IM_Zce +* RV32IMCF becomes RV32IMF_Zce + +[#misaC] +=== MISA.C + +MISA.C is set if the following extensions are selected: + +* Zca and not F +* Zca, Zcf and F is specified (RV32 only) +* Zca, Zcf and Zcd if D is specified (RV32 only) +** this configuration excludes Zcmp, Zcmt +* Zca, Zcd if D is specified (RV64 only) +** this configuration excludes Zcmp, Zcmt + +[#Zca,Zca] +=== Zca + +The Zca extension is added as way to refer to instructions in the C extension that do not include the floating-point loads and stores. + +Therefore it _excluded_ all 16-bit floating point loads and stores: _c.flw_, _c.flwsp_, _c.fsw_, _c.fswsp_, _c.fld_, _c.fldsp_, _c.fsd_, _c.fsdsp_. + +[NOTE] +==== +the C extension only includes F/D instructions when D and F are also specified +==== + +[#Zcf] +=== Zcf (RV32 only) + +Zcf is the existing set of compressed single precision floating point loads and stores: _c.flw_, _c.flwsp_, _c.fsw_, _c.fswsp_. + +Zcf is only relevant to RV32, it cannot be specified for RV64. + +The Zcf extension depends on the <> and F extensions. + +[#Zcd] +=== Zcd + +Zcd is the existing set of compressed double precision floating point loads and stores: _c.fld_, _c.fldsp_, _c.fsd_, _c.fsdsp_. + +The Zcd extension depends on the <> and D extensions. + +[#Zcb] +=== Zcb + +Zcb has simple code-size saving instructions which are easy to implement on all CPUs. + +All proposed encodings are currently reserved for all architectures, and have no conflicts with any existing extensions. + +NOTE: Zcb can be implemented on _any_ CPU as the instructions are 16-bit versions of existing 32-bit instructions from the application class profile. + +The Zcb extension depends on the <> extension. + +As shown on the individual instruction pages, many of the instructions in Zcb depend upon another extension being implemented. For example, _c.mul_ is only implemented if M or Zmmul is implemented, and _c.sext.b_ is only implemented if Zbb is implemented. + +The _c.mul_ encoding uses the CA register format along with other instructions such as _c.sub_, _c.xor_ etc. + +[NOTE] + + _c.sext.w_ is a pseudo-instruction for _c.addiw rd, 0_ (RV64) + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|yes +|yes +|c.lbu _rd'_, uimm(_rs1'_) +|<<#insns-c_lbu>> + +|yes +|yes +|c.lhu _rd'_, uimm(_rs1'_) +|<<#insns-c_lhu>> + +|yes +|yes +|c.lh _rd'_, uimm(_rs1'_) +|<<#insns-c_lh>> + +|yes +|yes +|c.sb _rs2'_, uimm(_rs1'_) +|<<#insns-c_sb>> + +|yes +|yes +|c.sh _rs2'_, uimm(_rs1'_) +|<<#insns-c_sh>> + +|yes +|yes +|c.zext.b _rsd'_ +|<<#insns-c_zext_b>> + +|yes +|yes +|c.sext.b _rsd'_ +|<<#insns-c_sext_b>> + +|yes +|yes +|c.zext.h _rsd'_ +|<<#insns-c_zext_h>> + +|yes +|yes +|c.sext.h _rsd'_ +|<<#insns-c_sext_h>> + +| +|yes +|c.zext.w _rsd'_ +|<<#insns-c_zext_w>> + +|yes +|yes +|c.not _rsd'_ +|<<#insns-c_not>> + +|yes +|yes +|c.mul _rsd'_, _rs2'_ +|<<#insns-c_mul>> + +|=== + +<<< + +[#Zcmp] +=== Zcmp + +The Zcmp extension is a set of instructions which may be executed as a series of existing 32-bit RISC-V instructions. + +This extension reuses some encodings from _c.fsdsp_. Therefore it is _incompatible_ with <>, + which is included when C and D extensions are both present. + +NOTE: Zcmp is primarily targeted at embedded class CPUs due to implementation complexity. Additionally, it is not compatible with architecture class profiles. + +The Zcmp extension depends on the <> extension. + +The PUSH/POP assembly syntax uses several variables, the meaning of which are: + +* _reg_list_ is a list containing 1 to 13 registers (ra and 0 to 12 s registers) +** valid values: {ra}, {ra, s0}, {ra, s0-s1}, {ra, s0-s2}, ..., {ra, s0-s8}, {ra, s0-s9}, {ra, s0-s11} +** note that {ra, s0-s10} is _not_ valid, giving 12 lists not 13 for better encoding +* _stack_adj_ is the total size of the stack frame. +** valid values vary with register list length and the specific encoding, see the instruction pages for details. + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|yes +|yes +|cm.push _{reg_list}, -stack_adj_ +|<<#insns-cm_push>> + +|yes +|yes +|cm.pop _{reg_list}, stack_adj_ +|<<#insns-cm_pop>> + +|yes +|yes +|cm.popret _{reg_list}, stack_adj_ +|<<#insns-cm_popret>> + +|yes +|yes +|cm.popretz _{reg_list}, stack_adj_ +|<<#insns-cm_popretz>> + +|yes +|yes +|cm.mva01s _rs1', rs2'_ +|<<#insns-cm_mva01s>> + +|yes +|yes +|cm.mvsa01 _r1s', r2s'_ +|<<#insns-cm_mvsa01>> + +|=== + +<<< + +[#Zcmt] +=== Zcmt + +Zcmt adds the table jump instructions and also adds the JVT CSR. The JVT CSR requires a +state enable if Smstateen is implemented. See <> for details. + +This extension reuses some encodings from _c.fsdsp_. Therefore it is _incompatible_ with <>, + which is included when C and D extensions are both present. + +NOTE: Zcmt is primarily targeted at embedded class CPUs due to implementation complexity. Additionally, it is not compatible with architecture class profiles. + +The Zcmt extension depends on the <> and Zicsr extensions. + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|yes +|yes +|cm.jt _index_ +|<<#insns-cm_jt>> + +|yes +|yes +|cm.jalt _index_ +|<<#insns-cm_jalt>> + +|=== + +[#Zc_formats] +=== Zc instruction formats + +Several instructions in this specification use the following new instruction formats. + +[%header,cols="2,3,2,1,1,1,1,1,1,1,1,1,1"] +|===================================================================== +| Format | instructions | 15:10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 +| CLB | c.lbu | funct6 3+| rs1' 2+| uimm 3+| rd' 2+| op +| CSB | c.sb | funct6 3+| rs1' 2+| uimm 3+| rs2' 2+| op +| CLH | c.lhu, c.lh | funct6 3+| rs1' | funct1 | uimm 3+| rd' 2+| op +| CSH | c.sh | funct6 3+| rs1' | funct1 | uimm 3+| rs2' 2+| op +| CU | c.[sz]ext.*, c.not | funct6 3+| rd'/rs1' 5+| funct5 2+| op +| CMMV | cm.mvsa01 cm.mva01s| funct6 3+| r1s' 2+| funct2 3+| r2s' 2+| op +| CMJT | cm.jt cm.jalt | funct6 8+| index 2+| op +| CMPP | cm.push*, cm.pop* | funct6 2+| funct2 4+| urlist 2+| spimm 2+| op +|===================================================================== + +[NOTE] +==== +c.mul uses the existing CA format +==== + +<<< + +[#Zcb_instructions] +=== Zcb instructions + +[#insns-c_lbu,reftext="Load unsigned byte, 16-bit encoding"] +==== c.lbu + +Synopsis: + +Load unsigned byte, 16-bit encoding + +Mnemonic: + +c.lbu _rd'_, _uimm_(_rs1'_) + +Encoding (RV32, RV64): + +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x0, attr: ['C0'] }, + { bits: 3, name: 'rd\'' }, + { bits: 2, name: 'uimm[0|1]' }, + { bits: 3, name: 'rs1\'' }, + { bits: 3, name: 0x0 }, + { bits: 3, name: 0x4, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +The immediate offset is formed as follows: + +[source,sail] +-- + uimm[31:2] = 0; + uimm[1] = encoding[5]; + uimm[0] = encoding[6]; +-- + +Description: + +This instruction loads a byte from the memory address formed by adding `__rs1__` to the zero extended immediate _uimm_. The resulting byte is zero extended to XLEN bits and is written to _rd'_. + +[NOTE] +==== +`__rd__` and `__rs1__` are from the standard 8-register set x8-x15. +==== + +Prerequisites: + +None + +32-bit equivalent: + +<> + +Operation: + +[source,sail] +---- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +X(rdc) = EXTZ(mem[X(rs1c)+EXTZ(uimm)][7..0]); +---- + +<<< +[#insns-c_lhu,reftext="Load unsigned halfword, 16-bit encoding"] +==== c.lhu + +Synopsis: + +Load unsigned halfword, 16-bit encoding + +Mnemonic: + +c.lhu _rd'_, _uimm_(_rs1'_) + +Encoding (RV32, RV64): + +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x0, attr: ['C0'] }, + { bits: 3, name: 'rd\'' }, + { bits: 1, name: 'uimm[1]' }, + { bits: 1, name: 0x0 }, + { bits: 3, name: 'rs1\'' }, + { bits: 3, name: 0x1 }, + { bits: 3, name: 0x4, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + + +The immediate offset is formed as follows: + +[source,sail] +---- + uimm[31:2] = 0; + uimm[1] = encoding[5]; + uimm[0] = 0; +---- + +Description: + +This instruction loads a halfword from the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. The resulting halfword is zero extended to XLEN bits and is written to _rd'_. + +[NOTE] +==== +_rd'_ and _rs1'_ are from the standard 8-register set x8-x15. +==== + +Prerequisites: + +None + +32-bit equivalent: + +<> + +Operation: + +[source,sail] +-- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +X(rdc) = EXTZ(load_mem[X(rs1c)+EXTZ(uimm)][15..0]); +-- + +<<< +[#insns-c_lh,reftext="Load signed halfword, 16-bit encoding"] +==== c.lh + +Synopsis: + +Load signed halfword, 16-bit encoding + +Mnemonic: + +c.lh _rd'_, _uimm_(_rs1'_) + +Encoding (RV32, RV64): + +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x0, attr: ['C0'] }, + { bits: 3, name: 'rd\'' }, + { bits: 1, name: 'uimm[1]' }, + { bits: 1, name: 0x1 }, + { bits: 3, name: 'rs1\'' }, + { bits: 3, name: 0x1 }, + { bits: 3, name: 0x4, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +The immediate offset is formed as follows: + +[source,sail] +---- + uimm[31:2] = 0; + uimm[1] = encoding[5]; + uimm[0] = 0; +---- + +Description: + +This instruction loads a halfword from the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. The resulting halfword is sign extended to XLEN bits and is written to _rd'_. + +[NOTE] +==== +_rd'_ and _rs1'_ are from the standard 8-register set x8-x15. +==== + +Prerequisites: + +None + +32-bit equivalent: + +<> + +Operation: + +[source,sail] +---- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +X(rdc) = EXTS(load_mem[X(rs1c)+EXTZ(uimm)][15..0]); +---- + +<<< +[#insns-c_sb,reftext="Store byte, 16-bit encoding"] +==== c.sb + +Synopsis: + +Store byte, 16-bit encoding + +Mnemonic: + +c.sb _rs2'_, _uimm_(_rs1'_) + +Encoding (RV32, RV64): + +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x0, attr: ['C0'] }, + { bits: 3, name: 'rs2\'' }, + { bits: 2, name: 'uimm[0|1]' }, + { bits: 3, name: 'rs1\'' }, + { bits: 3, name: 0x2 }, + { bits: 3, name: 0x4, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +The immediate offset is formed as follows: + +[source,sail] +---- + uimm[31:2] = 0; + uimm[1] = encoding[5]; + uimm[0] = encoding[6]; +---- + +Description: + +This instruction stores the least significant byte of _rs2'_ to the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. + +[NOTE] +==== +_rs1'_ and _rs2'_ are from the standard 8-register set x8-x15. +==== + +Prerequisites: + +None + +32-bit equivalent: + +<> + +Operation: + +[source,sail] +-- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +mem[X(rs1c)+EXTZ(uimm)][7..0] = X(rs2c) +-- + +<<< +[#insns-c_sh,reftext="Store halfword, 16-bit encoding"] +==== c.sh + +Synopsis: + +Store halfword, 16-bit encoding + +Mnemonic: + +c.sh _rs2'_, _uimm_(_rs1'_) + +Encoding (RV32, RV64): + +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x0, attr: ['C0'] }, + { bits: 3, name: 'rs2\'' }, + { bits: 1, name: 'uimm[1]' }, + { bits: 1, name: '0' }, + { bits: 3, name: 'rs1\'' }, + { bits: 3, name: 0x3 }, + { bits: 3, name: 0x4, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +The immediate offset is formed as follows: + +[source,sail] +---- + uimm[31:2] = 0; + uimm[1] = encoding[5]; + uimm[0] = 0; +---- + +Description: + +This instruction stores the least significant halfword of _rs2'_ to the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. + +[NOTE] +==== +_rs1'_ and _rs2'_ are from the standard 8-register set x8-x15. +==== + +Prerequisites: + +None + +32-bit equivalent: + +<> + +Operation: +[source,sail] +---- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +mem[X(rs1c)+EXTZ(uimm)][15..0] = X(rs2c) +---- + +<<< +[#insns-c_zext_b,reftext="Zero extend byte, 16-bit encoding"] +==== c.zext.b + +Synopsis: + +Zero extend byte, 16-bit encoding + +Mnemonic: + +c.zext.b _rd'/rs1'_ + +Encoding (RV32, RV64): + +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x1, attr: ['C1'] }, + { bits: 3, name: 0x0, attr: ['C.ZEXT.B'] }, + { bits: 2, name: 0x3, attr: ['FUNCT2'] }, + { bits: 3, name: 'rd\'/rs1\'', attr: ['SRCDST'] }, + { bits: 3, name: 0x7 }, + { bits: 3, name: 0x4, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +Description: + +This instruction takes a single source/destination operand. +It zero-extends the least-significant byte of the operand to XLEN bits by inserting zeros into all of +the bits more significant than 7. + +[NOTE] +==== +_rd'/rs1'_ is from the standard 8-register set x8-x15. +==== + +Prerequisites: + +None + +32-bit equivalent: + +[source,sail] +---- +andi rd'/rs1', rd'/rs1', 0xff +---- + +[NOTE] +==== +The SAIL module variable for _rd'/rs1'_ is called _rsdc_. +==== + +Operation: + +[source,sail] +---- +X(rsdc) = EXTZ(X(rsdc)[7..0]); +---- + +<<< +[#insns-c_sext_b,reftext="Sign extend byte, 16-bit encoding"] +==== c.sext.b + +Synopsis: + +Sign extend byte, 16-bit encoding + +Mnemonic: + +c.sext.b _rd'/rs1'_ + +Encoding (RV32, RV64): + +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x1, attr: ['C1'] }, + { bits: 3, name: 0x1, attr: ['C.SEXT.B'] }, + { bits: 2, name: 0x3, attr: ['FUNCT2'] }, + { bits: 3, name: 'rd\'/rs1\'', attr: ['SRCDST'] }, + { bits: 3, name: 0x7 }, + { bits: 3, name: 0x4, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +Description: + +This instruction takes a single source/destination operand. +It sign-extends the least-significant byte in the operand to XLEN bits by copying the most-significant bit +in the byte (i.e., bit 7) to all of the more-significant bits. + +[NOTE] +==== +_rd'/rs1'_ is from the standard 8-register set x8-x15. +==== + +Prerequisites: + +Zbb is also required. + +32-bit equivalent: + +<> from Zbb + +[NOTE] + +The SAIL module variable for _rd'/rs1'_ is called _rsdc_. + +Operation: + +[source,sail] +---- +X(rsdc) = EXTS(X(rsdc)[7..0]); +---- + +Prerequisites: + +Zbb is also required. + +32-bit equivalent: + +<> from Zbb + +[NOTE] +==== +The SAIL module variable for _rd'/rs1'_ is called _rsdc_. +==== + +Operation: + +[source,sail] +---- +X(rsdc) = EXTZ(X(rsdc)[15..0]); +---- + +<<< +[#insns-c_zext_h,reftext="Zero extend halfword, 16-bit encoding"] +==== c.zext.h + +Synopsis: + +Zero extend halfword, 16-bit encoding + +Mnemonic: + +c.zext.h _rd'/rs1'_ + +Encoding (RV32, RV64): + +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x1, attr: ['C1'] }, + { bits: 3, name: 0x2, attr: ['C.ZEXT.H'] }, + { bits: 2, name: 0x3, attr: ['FUNCT2'] }, + { bits: 3, name: 'rd\'/rs1\'', attr: ['SRCDST'] }, + { bits: 3, name: 0x7 }, + { bits: 3, name: 0x4, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +Description: + +This instruction takes a single source/destination operand. +It zero-extends the least-significant halfword of the operand to XLEN bits by inserting zeros into all of +the bits more significant than 15. + +[NOTE] +==== +_rd'/rs1'_ is from the standard 8-register set x8-x15. +==== + +Prerequisites: + +Zbb is also required. + +32-bit equivalent: + +<> from Zbb + +[NOTE] +==== +The SAIL module variable for _rd'/rs1'_ is called _rsdc_. +==== + +Operation: + +[source,sail] +---- +X(rsdc) = EXTZ(X(rsdc)[15..0]); +---- + +<<< +[#insns-c_sext_h,reftext="Sign extend halfword, 16-bit encoding"] +==== c.sext.h + +Synopsis: + +Sign extend halfword, 16-bit encoding + +Mnemonic: + +c.sext.h _rd'/rs1'_ + +Encoding (RV32, RV64): + +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x1, attr: ['C1'] }, + { bits: 3, name: 0x3, attr: ['C.SEXT.H'] }, + { bits: 2, name: 0x3, attr: ['FUNCT2'] }, + { bits: 3, name: 'rd\'/rs1\'', attr: ['SRCDST'] }, + { bits: 3, name: 0x7 }, + { bits: 3, name: 0x4, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +Description: + +This instruction takes a single source/destination operand. +It sign-extends the least-significant halfword in the operand to XLEN bits by copying the most-significant bit +in the halfword (i.e., bit 15) to all of the more-significant bits. + +[NOTE] +==== +_rd'/rs1'_ is from the standard 8-register set x8-x15. +==== + +Prerequisites: + +Zbb is also required. + +32-bit equivalent: + +<> from Zbb + +[NOTE] +==== +The SAIL module variable for _rd'/rs1'_ is called _rsdc_. +==== + +Operation: + +[source,sail] +---- +X(rsdc) = EXTS(X(rsdc)[15..0]); +---- + +<<< +[#insns-c_zext_w,reftext="Zero extend word, 16-bit encoding"] +==== c.zext.w + +Synopsis: + +Zero extend word, 16-bit encoding + +Mnemonic: + +c.zext.w _rd'/rs1'_ + +Encoding (RV64): + +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x1, attr: ['C1'] }, + { bits: 3, name: 0x4, attr: ['C.ZEXT.W'] }, + { bits: 2, name: 0x3, attr: ['FUNCT2'] }, + { bits: 3, name: 'rd\'/rs1\'', attr: ['SRCDST'] }, + { bits: 3, name: 0x7 }, + { bits: 3, name: 0x4, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +Description: + +This instruction takes a single source/destination operand. +It zero-extends the least-significant word of the operand to XLEN bits by inserting zeros into all of +the bits more significant than 31. + +[NOTE] +==== +_rd'/rs1'_ is from the standard 8-register set x8-x15. +==== + +Prerequisites: + +Zba is also required. + +32-bit equivalent: + +[source,sail] +---- +add.uw rd'/rs1', rd'/rs1', zero +---- + +[NOTE] +==== +The SAIL module variable for _rd'/rs1'_ is called _rsdc_. +==== + +Operation: + +[source,sail] +---- +X(rsdc) = EXTZ(X(rsdc)[31..0]); +---- + +<<< +[#insns-c_not,reftext="Bitwise not, 16-bit encoding"] +==== c.not + +Synopsis: + +Bitwise not, 16-bit encoding + +Mnemonic: + +c.not _rd'/rs1'_ + +Encoding (RV32, RV64): + +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x1, attr: ['C1'] }, + { bits: 3, name: 0x5, attr: ['C.NOT'] }, + { bits: 2, name: 0x3, attr: ['FUNCT2'] }, + { bits: 3, name: 'rd\'/rs1\'', attr: ['SRCDST'] }, + { bits: 3, name: 0x7 }, + { bits: 3, name: 0x4, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +Description: + +This instruction takes the one's complement of _rd'/rs1'_ and writes the result to the same register. + +[NOTE] +==== + _rd'/rs1'_ is from the standard 8-register set x8-x15. +==== + +Prerequisites: + +None + +32-bit equivalent: + +[source,sail] +---- +xori rd'/rs1', rd'/rs1', -1 +---- + +[NOTE] +==== +The SAIL module variable for _rd'/rs1'_ is called _rsdc_. +==== + +Operation: + +[source,sail] +---- +X(rsdc) = X(rsdc) XOR -1; +---- + +<<< +[#insns-c_mul,reftext="Multiply, 16-bit encoding"] +==== c.mul + +Synopsis: + +Multiply, 16-bit encoding + +Mnemonic: + +c.mul _rsd'_, _rs2'_ + +Encoding (RV32, RV64): + +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x1, attr: ['C1'] }, + { bits: 3, name: 'rs2\'', attr: ['SRC2'] }, + { bits: 2, name: 0x2, attr: ['FUNCT2'] }, + { bits: 3, name: 'rd\'/rs1\'', attr: ['SRCDST'] }, + { bits: 3, name: 0x7 }, + { bits: 3, name: 0x4, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +Description: + +This instruction multiplies XLEN bits of the source operands from _rsd'_ and _rs2'_ and writes the lowest XLEN bits of the result to _rsd'_. + +[NOTE] +==== +_rd'/rs1'_ and _rs2'_ are from the standard 8-register set x8-x15. +==== + +Prerequisites: + +M or Zmmul must be configured. + +32-bit equivalent: + +<> + +[NOTE] +==== +The SAIL module variable for _rd'/rs1'_ is called _rsdc_, and for _rs2'_ is called _rs2c_. +==== + +Operation: + +[source,sail] +---- +let result_wide = to_bits(2 * sizeof(xlen), signed(X(rsdc)) * signed(X(rs2c))); +X(rsdc) = result_wide[(sizeof(xlen) - 1) .. 0]; +---- + +<<< + +[#insns-pushpop,reftext="PUSH/POP Register Instructions"] +=== PUSH/POP register instructions + +These instructions are collectively referred to as PUSH/POP: + +* <<#insns-cm_push>> +* <<#insns-cm_pop>> +* <<#insns-cm_popret>> +* <<#insns-cm_popretz>> + +The term PUSH refers to _cm.push_. + +The term POP refers to _cm.pop_. + +The term POPRET refers to _cm.popret and cm.popretz_. + +Common details for these instructions are in this section. + +==== PUSH/POP functional overview + +PUSH, POP, POPRET are used to reduce the size of function prologues and epilogues. + +. The PUSH instruction +** adjusts the stack pointer to create the stack frame +** pushes (stores) the registers specified in the register list to the stack frame + +. The POP instruction +** pops (loads) the registers in the register list from the stack frame +** adjusts the stack pointer to destroy the stack frame + +. The POPRET instructions +** pop (load) the registers in the register list from the stack frame +** _cm.popretz_ also moves zero into _a0_ as the return value +** adjust the stack pointer to destroy the stack frame +** execute a _ret_ instruction to return from the function + +<<< +==== Example usage + +This example gives an illustration of the use of PUSH and POPRET. + +The function _processMarkers_ in the EMBench benchmark picojpeg in the following file on github: https://github.com/embench/embench-iot/blob/master/src/picojpeg/libpicojpeg.c[libpicojpeg.c] + +The prologue and epilogue compile with GCC10 to: + +[source,SAIL] +---- + + 0001098a : + 1098a: 711d addi sp,sp,-96 ;#cm.push(1) + 1098c: c8ca sw s2,80(sp) ;#cm.push(2) + 1098e: c6ce sw s3,76(sp) ;#cm.push(3) + 10990: c4d2 sw s4,72(sp) ;#cm.push(4) + 10992: ce86 sw ra,92(sp) ;#cm.push(5) + 10994: cca2 sw s0,88(sp) ;#cm.push(6) + 10996: caa6 sw s1,84(sp) ;#cm.push(7) + 10998: c2d6 sw s5,68(sp) ;#cm.push(8) + 1099a: c0da sw s6,64(sp) ;#cm.push(9) + 1099c: de5e sw s7,60(sp) ;#cm.push(10) + 1099e: dc62 sw s8,56(sp) ;#cm.push(11) + 109a0: da66 sw s9,52(sp) ;#cm.push(12) + 109a2: d86a sw s10,48(sp);#cm.push(13) + 109a4: d66e sw s11,44(sp);#cm.push(14) +... + 109f4: 4501 li a0,0 ;#cm.popretz(1) + 109f6: 40f6 lw ra,92(sp) ;#cm.popretz(2) + 109f8: 4466 lw s0,88(sp) ;#cm.popretz(3) + 109fa: 44d6 lw s1,84(sp) ;#cm.popretz(4) + 109fc: 4946 lw s2,80(sp) ;#cm.popretz(5) + 109fe: 49b6 lw s3,76(sp) ;#cm.popretz(6) + 10a00: 4a26 lw s4,72(sp) ;#cm.popretz(7) + 10a02: 4a96 lw s5,68(sp) ;#cm.popretz(8) + 10a04: 4b06 lw s6,64(sp) ;#cm.popretz(9) + 10a06: 5bf2 lw s7,60(sp) ;#cm.popretz(10) + 10a08: 5c62 lw s8,56(sp) ;#cm.popretz(11) + 10a0a: 5cd2 lw s9,52(sp) ;#cm.popretz(12) + 10a0c: 5d42 lw s10,48(sp);#cm.popretz(13) + 10a0e: 5db2 lw s11,44(sp);#cm.popretz(14) + 10a10: 6125 addi sp,sp,96 ;#cm.popretz(15) + 10a12: 8082 ret ;#cm.popretz(16) +---- + +<<< + +with the GCC option _-msave-restore_ the output is the following: + +[source,SAIL] +---- +0001080e : + 1080e: 73a012ef jal t0,11f48 <__riscv_save_12> + 10812: 1101 addi sp,sp,-32 +... + 10862: 4501 li a0,0 + 10864: 6105 addi sp,sp,32 + 10866: 71e0106f j 11f84 <__riscv_restore_12> +---- + +with PUSH/POPRET this reduces to + +[source,SAIL] +---- +0001080e : + 1080e: b8fa cm.push {ra,s0-s11},-96 +... + 10866: bcfa cm.popretz {ra,s0-s11}, 96 +---- + +The prologue / epilogue reduce from 60-bytes in the original code, to 14-bytes with _-msave-restore_, +and to 4-bytes with PUSH and POPRET. +As well as reducing the code-size PUSH and POPRET eliminate the branches from +calling the millicode _save/restore_ routines and so may also perform better. + +[NOTE] +==== +The calls to _/_ become 64-bit when the target functions are out of the ±1MB range, increasing the prologue/epilogue size to 22-bytes. +==== + +[NOTE] +==== +POP is typically used in tail-calling sequences where _ret_ is not used to return to _ra_ after destroying the stack frame. +==== + +[#pushpop-areg-list] + +===== Stack pointer adjustment handling + +The instructions all automatically adjust the stack pointer by enough to cover the memory required for the registers being saved or restored. +Additionally the _spimm_ field in the encoding allows the stack pointer to be adjusted in additional increments of 16-bytes. There is only a small restricted +range available in the encoding; if the range is insufficient then a separate _c.addi16sp_ can be used to increase the range. + +===== Register list handling + +There is no support for the _{ra, s0-s10}_ register list without also adding _s11_. Therefore the _{ra, s0-s11}_ register list must be used in this case. + +[#pushpop-idempotent-memory] +==== PUSH/POP Fault handling + +Correct execution requires that _sp_ refers to idempotent memory (also see <>), because the core must be able to +handle traps detected during the sequence. +The entire PUSH/POP sequence is re-executed after returning from the trap handler, and multiple traps are possible during the sequence. + +If a trap occurs during the sequence then _xEPC_ is updated with the PC of the instruction, _xTVAL_ (if not read-only-zero) updated with the bad address if it was an access fault and _xCAUSE_ updated with the type of trap. + +NOTE: It is implementation defined whether interrupts can also be taken during the sequence execution. + +[#pushpop-software-view] +==== Software view of execution + +===== Software view of the PUSH sequence + +From a software perspective the PUSH sequence appears as: + +* A sequence of stores writing the bytes required by the pseudo-code +** The bytes may be written in any order. +** The bytes may be grouped into larger accesses. +** Any of the bytes may be written multiple times. +* A stack pointer adjustment + +[NOTE] +==== +If an implementation allows interrupts during the sequence, and the interrupt handler uses _sp_ to allocate stack memory, then any stores which were executed before the interrupt may be overwritten by the handler. This is safe because the memory is idempotent and the stores will be re-executed when execution resumes. +==== + +The stack pointer adjustment must only be committed only when it is certain that the entire PUSH instruction will commit. + +Stores may also return imprecise faults from the bus. +It is platform defined whether the core implementation waits for the bus responses before continuing to the final stage of the sequence, +or handles errors responses after completing the PUSH instruction. + +<<< + +For example: + +[source,sail] +---- +cm.push {ra, s0-s5}, -64 +---- + +Appears to software as: + +[source,sail] +---- +# any bytes from sp-1 to sp-28 may be written multiple times before +# the instruction completes therefore these updates may be visible in +# the interrupt/exception handler below the stack pointer +sw s5, -4(sp) +sw s4, -8(sp) +sw s3,-12(sp) +sw s2,-16(sp) +sw s1,-20(sp) +sw s0,-24(sp) +sw ra,-28(sp) + +# this must only execute once, and will only execute after all stores +# completed without any precise faults, therefore this update is only +# visible in the interrupt/exception handler if cm.push has completed +addi sp, sp, -64 +---- + +===== Software view of the POP/POPRET sequence + +From a software perspective the POP/POPRET sequence appears as: + +* A sequence of loads reading the bytes required by the pseudo-code. +** The bytes may be loaded in any order. +** The bytes may be grouped into larger accesses. +** Any of the bytes may be loaded multiple times. +* A stack pointer adjustment +* An optional `li a0, 0` +* An optional `ret` + +If a trap occurs during the sequence, then any loads which were executed before the trap may update architectural state. +The loads will be re-executed once the trap handler completes, so the values will be overwritten. +Therefore it is permitted for an implementation to update some of the destination registers before taking a fault. + +The optional `li a0, 0`, stack pointer adjustment and optional `ret` must only be committed only when it is certain that the entire POP/POPRET instruction will commit. + +For POPRET once the stack pointer adjustment has been committed the `ret` must execute. + +<<< +For example: + +[source,sail] +---- +cm.popretz {ra, s0-s3}, 32; +---- + +Appears to software as: + +[source,sail] +---- +# any or all of these load instructions may execute multiple times +# therefore these updates may be visible in the interrupt/exception handler +lw s3, 28(sp) +lw s2, 24(sp) +lw s1, 20(sp) +lw s0, 16(sp) +lw ra, 12(sp) + +# these must only execute once, will only execute after all loads +# complete successfully all instructions must execute atomically +# therefore these updates are not visible in the interrupt/exception handler +li a0, 0 +addi sp, sp, 32 +ret +---- + +[[pushpop_non-idem-mem]] +==== Non-idempotent memory handling + +An implementation may have a requirement to issue a PUSH/POP instruction to non-idempotent memory. + +If the core implementation does not support PUSH/POP to non-idempotent memories, the core may use an idempotency PMA to detect it and take a +load (POP/POPRET) or store (PUSH) access fault exception in order to avoid unpredictable results. + +Software should only use these instructions on non-idempotent memory regions when software can tolerate the required memory accesses +being issued repeatedly in the case that they cause exceptions. + +<<< + +==== Example RV32I PUSH/POP sequences + +The examples are included show the load/store series expansion and the stack adjustment. +Examples of _cm.popret_ and _cm.popretz_ are not included, as the difference in the expanded sequence from _cm.pop_ is trivial in all cases. + +===== cm.push {ra, s0-s2}, -64 + +Encoding: _rlist_=7, _spimm_=3 + +expands to: + +[source,sail] +---- +sw s2, -4(sp); +sw s1, -8(sp); +sw s0, -12(sp); +sw ra, -16(sp); +addi sp, sp, -64; +---- + +===== cm.push {ra, s0-s11}, -112 + +Encoding: _rlist_=15, _spimm_=3 + +expands to: + +[source,sail] +---- +sw s11, -4(sp); +sw s10, -8(sp); +sw s9, -12(sp); +sw s8, -16(sp); +sw s7, -20(sp); +sw s6, -24(sp); +sw s5, -28(sp); +sw s4, -32(sp); +sw s3, -36(sp); +sw s2, -40(sp); +sw s1, -44(sp); +sw s0, -48(sp); +sw ra, -52(sp); +addi sp, sp, -112; +---- + +<<< + +===== cm.pop {ra}, 16 + +Encoding: _rlist_=4, _spimm_=0 + +expands to: + +[source,sail] +---- +lw ra, 12(sp); +addi sp, sp, 16; +---- + +===== cm.pop {ra, s0-s3}, 48 + +Encoding: _rlist_=8, _spimm_=1 + +expands to: + +[source,sail] +---- +lw s3, 44(sp); +lw s2, 40(sp); +lw s1, 36(sp); +lw s0, 32(sp); +lw ra, 28(sp); +addi sp, sp, 48; +---- + +===== cm.pop {ra, s0-s4}, 64 + +Encoding: _rlist_=9, _spimm_=2 + +expands to: + +[source,sail] +---- +lw s4, 60(sp); +lw s3, 56(sp); +lw s2, 52(sp); +lw s1, 48(sp); +lw s0, 44(sp); +lw ra, 40(sp); +addi sp, sp, 64; +---- + + +<<< +[#insns-cm_push,reftext="Create stack frame: push registers, allocate additional stack space."] +==== cm.push + +Synopsis: + +Create stack frame: store ra and 0 to 12 saved registers to the stack frame, optionally allocate additional stack space. + +Mnemonic: + +cm.push _{reg_list}, -stack_adj_ + +Encoding (RV32, RV64): + +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x2, attr: ['C2'] }, + { bits: 2, name: 'spimm\[5:4\]', attr: [] }, + { bits: 4, name: 'rlist', attr: [] }, + { bits: 5, name: 0x18, attr: [] }, + { bits: 3, name: 0x5, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +[NOTE] +==== +_rlist_ values 0 to 3 are reserved for a future EABI variant called _cm.push.e_ +==== + +Assembly Syntax: + +[source,sail] +-- +cm.push {reg_list}, -stack_adj +cm.push {xreg_list}, -stack_adj +-- + +The variables used in the assembly syntax are defined below. + +[source,sail] +---- +RV32E: + +switch (rlist){ + case 4: {reg_list="ra"; xreg_list="x1";} + case 5: {reg_list="ra, s0"; xreg_list="x1, x8";} + case 6: {reg_list="ra, s0-s1"; xreg_list="x1, x8-x9";} + default: reserved(); +} +stack_adj = stack_adj_base + spimm[5:4] * 16; +---- + +[source,sail] +---- +RV32I, RV64: + +switch (rlist){ + case 4: {reg_list="ra"; xreg_list="x1";} + case 5: {reg_list="ra, s0"; xreg_list="x1, x8";} + case 6: {reg_list="ra, s0-s1"; xreg_list="x1, x8-x9";} + case 7: {reg_list="ra, s0-s2"; xreg_list="x1, x8-x9, x18";} + case 8: {reg_list="ra, s0-s3"; xreg_list="x1, x8-x9, x18-x19";} + case 9: {reg_list="ra, s0-s4"; xreg_list="x1, x8-x9, x18-x20";} + case 10: {reg_list="ra, s0-s5"; xreg_list="x1, x8-x9, x18-x21";} + case 11: {reg_list="ra, s0-s6"; xreg_list="x1, x8-x9, x18-x22";} + case 12: {reg_list="ra, s0-s7"; xreg_list="x1, x8-x9, x18-x23";} + case 13: {reg_list="ra, s0-s8"; xreg_list="x1, x8-x9, x18-x24";} + case 14: {reg_list="ra, s0-s9"; xreg_list="x1, x8-x9, x18-x25";} + //note - to include s10, s11 must also be included + case 15: {reg_list="ra, s0-s11"; xreg_list="x1, x8-x9, x18-x27";} + default: reserved(); +} +stack_adj = stack_adj_base + spimm[5:4] * 16; +---- + +[source,sail] +---- +RV32E: + +stack_adj_base = 16; +Valid values: +stack_adj = [16|32|48|64]; +---- + +[source,sail] +---- +RV32I: + +switch (rlist) { + case 4.. 7: stack_adj_base = 16; + case 8..11: stack_adj_base = 32; + case 12..14: stack_adj_base = 48; + case 15: stack_adj_base = 64; +} + +Valid values: +switch (rlist) { + case 4.. 7: stack_adj = [16|32|48| 64]; + case 8..11: stack_adj = [32|48|64| 80]; + case 12..14: stack_adj = [48|64|80| 96]; + case 15: stack_adj = [64|80|96|112]; +} +---- + +[source,sail] +---- +RV64: + +switch (rlist) { + case 4.. 5: stack_adj_base = 16; + case 6.. 7: stack_adj_base = 32; + case 8.. 9: stack_adj_base = 48; + case 10..11: stack_adj_base = 64; + case 12..13: stack_adj_base = 80; + case 14: stack_adj_base = 96; + case 15: stack_adj_base = 112; +} + +Valid values: +switch (rlist) { + case 4.. 5: stack_adj = [ 16| 32| 48| 64]; + case 6.. 7: stack_adj = [ 32| 48| 64| 80]; + case 8.. 9: stack_adj = [ 48| 64| 80| 96]; + case 10..11: stack_adj = [ 64| 80| 96|112]; + case 12..13: stack_adj = [ 80| 96|112|128]; + case 14: stack_adj = [ 96|112|128|144]; + case 15: stack_adj = [112|128|144|160]; +} +---- + +<<< +Description: + +This instruction pushes (stores) the registers in _reg_list_ to the memory below the stack pointer, +and then creates the stack frame by decrementing the stack pointer by _stack_adj_, +including any additional stack space requested by the value of _spimm_. + + +[NOTE] +==== +All ABI register mappings are for the UABI. An EABI version is planned once the EABI is frozen. +==== + +For further information see <>. + +Stack Adjustment Calculation: + +_stack_adj_base_ is the minimum number of bytes, in multiples of 16-byte address increments, required to cover the registers in the list. + +_spimm_ is the number of additional 16-byte address increments allocated for the stack frame. + +The total stack adjustment represents the total size of the stack frame, which is _stack_adj_base_ added to _spimm_ scaled by 16, +as defined above. + +Prerequisites: + +None + +32-bit equivalent: + +No direct equivalent encoding exists + +Operation: + +The first section of pseudo-code may be executed multiple times before the instruction successfully completes. + +[source,sail] +---- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +if (XLEN==32) bytes=4; else bytes=8; + +addr=sp-bytes; +for(i in 27,26,25,24,23,22,21,20,19,18,9,8,1) { + //if register i is in xreg_list + if (xreg_list[i]) { + switch(bytes) { + 4: asm("sw x[i], 0(addr)"); + 8: asm("sd x[i], 0(addr)"); + } + addr-=bytes; + } +} +---- + +The final section of pseudo-code executes atomically, and only executes if the section above completes without any exceptions or interrupts. + +[source,sail] +---- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +sp-=stack_adj; +---- + +<<< +[#insns-cm_pop,reftext="Pop registers, deallocate stack frame."] +==== cm.pop + +Synopsis: + +Destroy stack frame: load ra and 0 to 12 saved registers from the stack frame, deallocate the stack frame. + +Mnemonic: + +cm.pop _{reg_list}, stack_adj_ + +Encoding (RV32, RV64): + +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x2, attr: ['C2'] }, + { bits: 2, name: 'spimm\[5:4\]', attr: [] }, + { bits: 4, name: 'rlist', attr: [] }, + { bits: 5, name: 0x1a, attr: [] }, + { bits: 3, name: 0x5, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +[NOTE] +==== +_rlist_ values 0 to 3 are reserved for a future EABI variant called _cm.pop.e_ +==== + +Assembly Syntax: + +[source,sail] +---- +cm.pop {reg_list}, stack_adj +cm.pop {xreg_list}, stack_adj +---- + +The variables used in the assembly syntax are defined below. + +[source,sail] +---- +RV32E: + +switch (rlist){ + case 4: {reg_list="ra"; xreg_list="x1";} + case 5: {reg_list="ra, s0"; xreg_list="x1, x8";} + case 6: {reg_list="ra, s0-s1"; xreg_list="x1, x8-x9";} + default: reserved(); +} +stack_adj = stack_adj_base + spimm[5:4] * 16; +---- + +[source,sail] +---- +RV32I, RV64: + +switch (rlist){ + case 4: {reg_list="ra"; xreg_list="x1";} + case 5: {reg_list="ra, s0"; xreg_list="x1, x8";} + case 6: {reg_list="ra, s0-s1"; xreg_list="x1, x8-x9";} + case 7: {reg_list="ra, s0-s2"; xreg_list="x1, x8-x9, x18";} + case 8: {reg_list="ra, s0-s3"; xreg_list="x1, x8-x9, x18-x19";} + case 9: {reg_list="ra, s0-s4"; xreg_list="x1, x8-x9, x18-x20";} + case 10: {reg_list="ra, s0-s5"; xreg_list="x1, x8-x9, x18-x21";} + case 11: {reg_list="ra, s0-s6"; xreg_list="x1, x8-x9, x18-x22";} + case 12: {reg_list="ra, s0-s7"; xreg_list="x1, x8-x9, x18-x23";} + case 13: {reg_list="ra, s0-s8"; xreg_list="x1, x8-x9, x18-x24";} + case 14: {reg_list="ra, s0-s9"; xreg_list="x1, x8-x9, x18-x25";} + //note - to include s10, s11 must also be included + case 15: {reg_list="ra, s0-s11"; xreg_list="x1, x8-x9, x18-x27";} + default: reserved(); +} +stack_adj = stack_adj_base + spimm[5:4] * 16; +---- + +[source,sail] +---- +RV32E: + +stack_adj_base = 16; +Valid values: +stack_adj = [16|32|48|64]; +---- + +[source,sail] +---- +RV32I: + +switch (rlist) { + case 4.. 7: stack_adj_base = 16; + case 8..11: stack_adj_base = 32; + case 12..14: stack_adj_base = 48; + case 15: stack_adj_base = 64; +} + +Valid values: +switch (rlist) { + case 4.. 7: stack_adj = [16|32|48| 64]; + case 8..11: stack_adj = [32|48|64| 80]; + case 12..14: stack_adj = [48|64|80| 96]; + case 15: stack_adj = [64|80|96|112]; +} +---- + +[source,sail] +---- +RV64: + +switch (rlist) { + case 4.. 5: stack_adj_base = 16; + case 6.. 7: stack_adj_base = 32; + case 8.. 9: stack_adj_base = 48; + case 10..11: stack_adj_base = 64; + case 12..13: stack_adj_base = 80; + case 14: stack_adj_base = 96; + case 15: stack_adj_base = 112; +} + +Valid values: +switch (rlist) { + case 4.. 5: stack_adj = [ 16| 32| 48| 64]; + case 6.. 7: stack_adj = [ 32| 48| 64| 80]; + case 8.. 9: stack_adj = [ 48| 64| 80| 96]; + case 10..11: stack_adj = [ 64| 80| 96|112]; + case 12..13: stack_adj = [ 80| 96|112|128]; + case 14: stack_adj = [ 96|112|128|144]; + case 15: stack_adj = [112|128|144|160]; +} +---- + +<<< + +Description: + +This instruction pops (loads) the registers in _reg_list_ from stack memory, +and then adjusts the stack pointer by _stack_adj_. + +[NOTE] +==== +All ABI register mappings are for the UABI. An EABI version is planned once the EABI is frozen. +==== + +For further information see <>. + +Stack Adjustment Calculation: + +_stack_adj_base_ is the minimum number of bytes, in multiples of 16-byte address increments, required to cover the registers in the list. + +_spimm_ is the number of additional 16-byte address increments allocated for the stack frame. + +The total stack adjustment represents the total size of the stack frame, which is _stack_adj_base_ added to _spimm_ scaled by 16, +as defined above. + +Prerequisites: + +None + +32-bit equivalent: + +No direct equivalent encoding exists + +Operation: + +The first section of pseudo-code may be executed multiple times before the instruction successfully completes. + +[source,sail] +---- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +if (XLEN==32) bytes=4; else bytes=8; + +addr=sp+stack_adj-bytes; +for(i in 27,26,25,24,23,22,21,20,19,18,9,8,1) { + //if register i is in xreg_list + if (xreg_list[i]) { + switch(bytes) { + 4: asm("lw x[i], 0(addr)"); + 8: asm("ld x[i], 0(addr)"); + } + addr-=bytes; + } +} +---- + +The final section of pseudo-code executes atomically, and only executes if the section above completes without any exceptions or interrupts. + +[source,sail] +---- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +sp+=stack_adj; +---- + +<<< +[#insns-cm_popretz,reftext="Pop registers, deallocate stack frame, return zero."] +==== cm.popretz + +Synopsis: + +Destroy stack frame: load ra and 0 to 12 saved registers from the stack frame, deallocate the stack frame, move zero into a0, return to ra. + +Mnemonic: + +cm.popretz _{reg_list}, stack_adj_ + +Encoding (RV32, RV64): + +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x2, attr: ['C2'] }, + { bits: 2, name: 'spimm\[5:4\]', attr: [] }, + { bits: 4, name: 'rlist', attr: [] }, + { bits: 5, name: 0x1c, attr: [] }, + { bits: 3, name: 0x5, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +[NOTE] +==== +_rlist_ values 0 to 3 are reserved for a future EABI variant called _cm.popretz.e_ +==== + +Assembly Syntax: + +[source,sail] +---- +cm.popretz {reg_list}, stack_adj +cm.popretz {xreg_list}, stack_adj +---- + +[source,sail] +---- +RV32E: + +switch (rlist){ + case 4: {reg_list="ra"; xreg_list="x1";} + case 5: {reg_list="ra, s0"; xreg_list="x1, x8";} + case 6: {reg_list="ra, s0-s1"; xreg_list="x1, x8-x9";} + default: reserved(); +} +stack_adj = stack_adj_base + spimm[5:4] * 16; +---- + +[source,sail] +---- +RV32I, RV64: + +switch (rlist){ + case 4: {reg_list="ra"; xreg_list="x1";} + case 5: {reg_list="ra, s0"; xreg_list="x1, x8";} + case 6: {reg_list="ra, s0-s1"; xreg_list="x1, x8-x9";} + case 7: {reg_list="ra, s0-s2"; xreg_list="x1, x8-x9, x18";} + case 8: {reg_list="ra, s0-s3"; xreg_list="x1, x8-x9, x18-x19";} + case 9: {reg_list="ra, s0-s4"; xreg_list="x1, x8-x9, x18-x20";} + case 10: {reg_list="ra, s0-s5"; xreg_list="x1, x8-x9, x18-x21";} + case 11: {reg_list="ra, s0-s6"; xreg_list="x1, x8-x9, x18-x22";} + case 12: {reg_list="ra, s0-s7"; xreg_list="x1, x8-x9, x18-x23";} + case 13: {reg_list="ra, s0-s8"; xreg_list="x1, x8-x9, x18-x24";} + case 14: {reg_list="ra, s0-s9"; xreg_list="x1, x8-x9, x18-x25";} + //note - to include s10, s11 must also be included + case 15: {reg_list="ra, s0-s11"; xreg_list="x1, x8-x9, x18-x27";} + default: reserved(); +} +stack_adj = stack_adj_base + spimm[5:4] * 16; +---- + +[source,sail] +---- +RV32E: + +stack_adj_base = 16; +Valid values: +stack_adj = [16|32|48|64]; +---- + +[source,sail] +---- +RV32I: + +switch (rlist) { + case 4.. 7: stack_adj_base = 16; + case 8..11: stack_adj_base = 32; + case 12..14: stack_adj_base = 48; + case 15: stack_adj_base = 64; +} + +Valid values: +switch (rlist) { + case 4.. 7: stack_adj = [16|32|48| 64]; + case 8..11: stack_adj = [32|48|64| 80]; + case 12..14: stack_adj = [48|64|80| 96]; + case 15: stack_adj = [64|80|96|112]; +} +---- + +[source,sail] +---- +RV64: + +switch (rlist) { + case 4.. 5: stack_adj_base = 16; + case 6.. 7: stack_adj_base = 32; + case 8.. 9: stack_adj_base = 48; + case 10..11: stack_adj_base = 64; + case 12..13: stack_adj_base = 80; + case 14: stack_adj_base = 96; + case 15: stack_adj_base = 112; +} + +Valid values: +switch (rlist) { + case 4.. 5: stack_adj = [ 16| 32| 48| 64]; + case 6.. 7: stack_adj = [ 32| 48| 64| 80]; + case 8.. 9: stack_adj = [ 48| 64| 80| 96]; + case 10..11: stack_adj = [ 64| 80| 96|112]; + case 12..13: stack_adj = [ 80| 96|112|128]; + case 14: stack_adj = [ 96|112|128|144]; + case 15: stack_adj = [112|128|144|160]; +} +---- + +<<< + +Description: + +This instruction pops (loads) the registers in _reg_list_ from stack memory, adjusts the stack pointer by _stack_adj_, moves zero into a0 and then returns to _ra_. + +[NOTE] +==== +All ABI register mappings are for the UABI. An EABI version is planned once the EABI is frozen. +==== + +For further information see <>. + +Stack Adjustment Calculation: + +_stack_adj_base_ is the minimum number of bytes, in multiples of 16-byte address increments, required to cover the registers in the list. + +_spimm_ is the number of additional 16-byte address increments allocated for the stack frame. + +The total stack adjustment represents the total size of the stack frame, which is _stack_adj_base_ added to _spimm_ scaled by 16, as defined above. + +Prerequisites: + +None + +32-bit equivalent: + +No direct equivalent encoding exists + + +Operation: + +The first section of pseudo-code may be executed multiple times before the instruction successfully completes. + +[source,sail] +---- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +if (XLEN==32) bytes=4; else bytes=8; + +addr=sp+stack_adj-bytes; +for(i in 27,26,25,24,23,22,21,20,19,18,9,8,1) { + //if register i is in xreg_list + if (xreg_list[i]) { + switch(bytes) { + 4: asm("lw x[i], 0(addr)"); + 8: asm("ld x[i], 0(addr)"); + } + addr-=bytes; + } +} +---- + +The final section of pseudo-code executes atomically, and only executes if the section above completes without any exceptions or interrupts. + +[NOTE] +==== +The _li a0, 0_ *could* be executed more than once, but is included in the atomic section for convenience. +==== + +[source,sail] +---- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +asm("li a0, 0"); +sp+=stack_adj; +asm("ret"); +---- + +<<< +[#insns-cm_popret,reftext="Pop registers, deallocate stack frame, return."] +==== cm.popret + +Synopsis: + +Destroy stack frame: load ra and 0 to 12 saved registers from the stack frame, deallocate the stack frame, return to ra. + +Mnemonic: + +cm.popret _{reg_list}, stack_adj_ + +Encoding (RV32, RV64): + +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x2, attr: ['C2'] }, + { bits: 2, name: 'spimm\[5:4\]', attr: [] }, + { bits: 4, name: 'rlist', attr: [] }, + { bits: 5, name: 0x1e, attr: [] }, + { bits: 3, name: 0x5, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +[NOTE] +==== +_rlist_ values 0 to 3 are reserved for a future EABI variant called _cm.popret.e_ +==== + +Assembly Syntax: + +[source,sail] +---- +cm.popret {reg_list}, stack_adj +cm.popret {xreg_list}, stack_adj +---- + +The variables used in the assembly syntax are defined below. + +[source,sail] +---- +RV32E: + +switch (rlist){ + case 4: {reg_list="ra"; xreg_list="x1";} + case 5: {reg_list="ra, s0"; xreg_list="x1, x8";} + case 6: {reg_list="ra, s0-s1"; xreg_list="x1, x8-x9";} + default: reserved(); +} +stack_adj = stack_adj_base + spimm[5:4] * 16; +---- + +[source,sail] +---- +RV32I, RV64: + +switch (rlist){ + case 4: {reg_list="ra"; xreg_list="x1";} + case 5: {reg_list="ra, s0"; xreg_list="x1, x8";} + case 6: {reg_list="ra, s0-s1"; xreg_list="x1, x8-x9";} + case 7: {reg_list="ra, s0-s2"; xreg_list="x1, x8-x9, x18";} + case 8: {reg_list="ra, s0-s3"; xreg_list="x1, x8-x9, x18-x19";} + case 9: {reg_list="ra, s0-s4"; xreg_list="x1, x8-x9, x18-x20";} + case 10: {reg_list="ra, s0-s5"; xreg_list="x1, x8-x9, x18-x21";} + case 11: {reg_list="ra, s0-s6"; xreg_list="x1, x8-x9, x18-x22";} + case 12: {reg_list="ra, s0-s7"; xreg_list="x1, x8-x9, x18-x23";} + case 13: {reg_list="ra, s0-s8"; xreg_list="x1, x8-x9, x18-x24";} + case 14: {reg_list="ra, s0-s9"; xreg_list="x1, x8-x9, x18-x25";} + //note - to include s10, s11 must also be included + case 15: {reg_list="ra, s0-s11"; xreg_list="x1, x8-x9, x18-x27";} + default: reserved(); +} +stack_adj = stack_adj_base + spimm[5:4] * 16; +---- + +[source,sail] +---- +RV32E: + +stack_adj_base = 16; +Valid values: +stack_adj = [16|32|48|64]; +---- + +[source,sail] +---- +RV32I: + +switch (rlist) { + case 4.. 7: stack_adj_base = 16; + case 8..11: stack_adj_base = 32; + case 12..14: stack_adj_base = 48; + case 15: stack_adj_base = 64; +} + +Valid values: +switch (rlist) { + case 4.. 7: stack_adj = [16|32|48| 64]; + case 8..11: stack_adj = [32|48|64| 80]; + case 12..14: stack_adj = [48|64|80| 96]; + case 15: stack_adj = [64|80|96|112]; +} +---- + +[source,sail] +---- +RV64: + +switch (rlist) { + case 4.. 5: stack_adj_base = 16; + case 6.. 7: stack_adj_base = 32; + case 8.. 9: stack_adj_base = 48; + case 10..11: stack_adj_base = 64; + case 12..13: stack_adj_base = 80; + case 14: stack_adj_base = 96; + case 15: stack_adj_base = 112; +} + +Valid values: +switch (rlist) { + case 4.. 5: stack_adj = [ 16| 32| 48| 64]; + case 6.. 7: stack_adj = [ 32| 48| 64| 80]; + case 8.. 9: stack_adj = [ 48| 64| 80| 96]; + case 10..11: stack_adj = [ 64| 80| 96|112]; + case 12..13: stack_adj = [ 80| 96|112|128]; + case 14: stack_adj = [ 96|112|128|144]; + case 15: stack_adj = [112|128|144|160]; +} +---- + +<<< + +Description: + +This instruction pops (loads) the registers in _reg_list_ from stack memory, adjusts the stack pointer by _stack_adj_ and then returns to _ra_. + +[NOTE] +==== +All ABI register mappings are for the UABI. An EABI version is planned once the EABI is frozen. +==== + +For further information see <>. + +Stack Adjustment Calculation: + +_stack_adj_base_ is the minimum number of bytes, in multiples of 16-byte address increments, required to cover the registers in the list. + +_spimm_ is the number of additional 16-byte address increments allocated for the stack frame. + +The total stack adjustment represents the total size of the stack frame, which is _stack_adj_base_ added to _spimm_ scaled by 16, as defined above. + +Prerequisites: + +None + +32-bit equivalent: + +No direct equivalent encoding exists + +Operation: + +The first section of pseudo-code may be executed multiple times before the instruction successfully completes. + +[source,sail] +---- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +if (XLEN==32) bytes=4; else bytes=8; + +addr=sp+stack_adj-bytes; +for(i in 27,26,25,24,23,22,21,20,19,18,9,8,1) { + //if register i is in xreg_list + if (xreg_list[i]) { + switch(bytes) { + 4: asm("lw x[i], 0(addr)"); + 8: asm("ld x[i], 0(addr)"); + } + addr-=bytes; + } +} +---- + +The final section of pseudo-code executes atomically, and only executes if the section above completes without any exceptions or interrupts. + +[source,sail] +---- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +sp+=stack_adj; +asm("ret"); +---- + +<<< + +[#insns-cm_mvsa01,reftext="Move a0-a1 into two different s0-s7 registers"] +==== cm.mvsa01 + +Synopsis: + +Move a0-a1 into two registers of s0-s7 + +Mnemonic: + +cm.mvsa01 _r1s'_, _r2s'_ + +Encoding (RV32, RV64): + +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x2, attr: ['C2'] }, + { bits: 3, name: 'r2s\'', attr: [] }, + { bits: 2, name: 0x1, attr: [] }, + { bits: 3, name: 'r1s\'', attr: [] }, + { bits: 3, name: 0x3, attr: [] }, + { bits: 3, name: 0x5, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +[NOTE] +==== +For the encoding to be legal _r1s'_ != _r2s'_. +==== + +Assembly Syntax: + +[source,sail] +---- +cm.mvsa01 r1s', r2s' +---- + +Description: +This instruction moves _a0_ into _r1s'_ and _a1_ into _r2s'_. _r1s'_ and _r2s'_ must be different. +The execution is atomic, so it is not possible to observe state where only one of _r1s'_ or _r2s'_ has been updated. + +The encoding uses _sreg_ number specifiers instead of _xreg_ number specifiers to save encoding space. +The mapping between them is specified in the pseudo-code below. + +[NOTE] +==== +The _s_ register mapping is taken from the UABI, and may not match the currently unratified EABI. _cm.mvsa01.e_ may be included in the future. +==== + +Prerequisites: + +None + +32-bit equivalent: + +No direct equivalent encoding exists. + +Operation: + +[source,sail] +---- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. +if (RV32E && (r1sc>1 || r2sc>1)) { + reserved(); +} +xreg1 = {r1sc[2:1]>0,r1sc[2:1]==0,r1sc[2:0]}; +xreg2 = {r2sc[2:1]>0,r2sc[2:1]==0,r2sc[2:0]}; +X[xreg1] = X[10]; +X[xreg2] = X[11]; +---- + +<<< + +[#insns-cm_mva01s,reftext="Move two s0-s7 registers into a0-a1"] +==== cm.mva01s + +Synopsis: + +Move two s0-s7 registers into a0-a1 + +Mnemonic: + +cm.mva01s _r1s'_, _r2s'_ + +Encoding (RV32, RV64): + +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x2, attr: ['C2'] }, + { bits: 3, name: 'r2s\'', attr: [] }, + { bits: 2, name: 0x3, attr: [] }, + { bits: 3, name: 'r1s\'', attr: [] }, + { bits: 3, name: 0x3, attr: [] }, + { bits: 3, name: 0x5, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +Assembly Syntax: + +[source,sail] +---- +cm.mva01s r1s', r2s' +---- + +Description: +This instruction moves _r1s'_ into _a0_ and _r2s'_ into _a1_. +The execution is atomic, so it is not possible to observe state where only one of _a0_ or _a1_ have been updated. + +The encoding uses _sreg_ number specifiers instead of _xreg_ number specifiers to save encoding space. +The mapping between them is specified in the pseudo-code below. + +[NOTE] +==== +The _s_ register mapping is taken from the UABI, and may not match the currently unratified EABI. _cm.mva01s.e_ may be included in the future. +==== + +Prerequisites: + +None + +32-bit equivalent: + +No direct equivalent encoding exists. + +Operation: + +[source,sail] +---- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. +if (RV32E && (r1sc>1 || r2sc>1)) { + reserved(); +} +xreg1 = {r1sc[2:1]>0,r1sc[2:1]==0,r1sc[2:0]}; +xreg2 = {r2sc[2:1]>0,r2sc[2:1]==0,r2sc[2:0]}; +X[10] = X[xreg1]; +X[11] = X[xreg2]; +---- + +<<< + +[#insns-tablejump,reftext="Table Jump Overview"] +=== Table Jump Overview + +_cm.jt_ (<<#insns-cm_jt>>) and _cm.jalt_ (<<#insns-cm_jalt>>) are referred to as table jump. + +Table jump uses a 256-entry XLEN wide table in instruction memory to contain function addresses. +The table must be a minimum of 64-byte aligned. + +Table entries follow the current data endianness. This is different from normal instruction fetch which is always little-endian. + +_cm.jt_ and _cm.jalt_ encodings index the table, giving access to functions within the full XLEN wide address space. + +This is used as a form of dictionary compression to reduce the code size of _jal_ / _auipc+jalr_ / _jr_ / _auipc+jr_ instructions. + +Table jump allows the linker to replace the following instruction sequences with a _cm.jt_ or _cm.jalt_ encoding, and an entry in the table: + +* 32-bit _j_ calls +* 32-bit _jal_ ra calls +* 64-bit _auipc+jr_ calls to fixed locations +* 64-bit _auipc+jalr ra_ calls to fixed locations +** The _auipc+jr/jalr_ sequence is used because the offset from the PC is out of the ±1MB range. + +If a return address stack is implemented, then as _cm.jalt_ is equivalent to _jal ra_, it pushes to the stack. + +==== JVT + +The base of the table is in the JVT CSR (see <>), each table entry is XLEN bits. + +If the same function is called with and without linking then it must have two entries in the table. +This is typically caused by the same function being called with and without tail calling. + +[#tablejump-fault-handling] +==== Table Jump Fault handling + +For a table jump instruction, the table entry that the instruction selects is considered an extension of the instruction itself. +Hence, the execution of a table jump instruction involves two instruction fetches, the first to read the instruction (_cm.jt_/_cm.jalt_) +and the second to read from the jump vector table (JVT). Both instruction fetches are _implicit_ reads, and both require +execute permission; read permission is irrelevant. It is recommended that the second fetch be ignored for hardware triggers and breakpoints. + +Memory writes to the jump vector table require an instruction barrier (_fence.i_) to guarantee that they are visible to the instruction fetch. + +Multiple contexts may have different jump vector tables. JVT may be switched between them without an instruction barrier +if the tables have not been updated in memory since the last _fence.i_. + +If an exception occurs on either instruction fetch, xEPC is set to the PC of the table jump instruction, xCAUSE is set as expected for the type of fault and xTVAL (if not set to zero) contains the fetch address which caused the fault. + +<<< +[#csrs-jvt,reftext="JVT CSR, table jump base vector and control register"] +==== JVT CSR + +Synopsis: + +Table jump base vector and control register + +Address: + +0x0017 + +Permissions: + +URW + +Format (RV32): + +[wavedrom, , svg] +.... +{reg:[ + { bits: 6, name: 'mode', attr: ['6'] }, + { bits: 26, name: 'base[XLEN-1:6] (WARL)', attr: ['XLEN-6'] }, +],config:{bits:32}} +.... + +Format (RV64): + +[wavedrom, , svg] +.... +{reg:[ + { bits: 6, name: 'mode', attr: ['6'] }, + { bits: 58, name: 'base[XLEN-1:6] (WARL)', attr: ['XLEN-6'] }, +],config:{bits:64}} +.... + +Description: + +The _JVT_ register is an XLEN-bit *WARL* read/write register that holds the jump table configuration, consisting of the jump table base address (BASE) and the jump table mode (MODE). + +If <> is implemented then _JVT_ must also be implemented, but can contain a read-only value. If _JVT_ is writable, the set of values the register may hold can vary by implementation. The value in the BASE field must always be aligned on a 64-byte boundary. + +_JVT.base_ is a virtual address, whenever virtual memory is enabled. + +The memory pointed to by _JVT.base_ is treated as instruction memory for the purpose of executing table jump instructions, implying execute access permission. + +[#JVT-config-table] +._JVT.mode_ definition +[width="60%",options=header] +|============================================================================================= +| JVT.mode | Comment +| 000000 | Jump table mode +| others | *reserved for future standard use* +|============================================================================================= + +_JVT.mode_ is a *WARL* field, so can only be programmed to modes which are implemented. Therefore the discovery mechanism is to +attempt to program different modes and read back the values to see which are available. Jump table mode _must_ be implemented. + +[NOTE] +==== +in future the RISC-V Unified Discovery method will report the available modes. +==== + +Architectural State: + +_JVT_ adds architectural state to the system software context (such as an OS process), therefore must be saved/restored on context switches. + +State Enable: + +If the Smstateen extension is implemented, then bit 2 in _mstateen0_, _sstateen0_, and _hstateen0_ is implemented. If bit 2 of a controlling _stateen0_ CSR is zero, then access to the _JVT_ CSR and execution of a _cm.jalt_ or _cm.jt_ instruction by a lower privilege level results in an Illegal Instruction trap (or, if appropriate, a Virtual Instruction trap). + +<<< +[#insns-cm_jt,reftext="Jump via table"] +==== cm.jt + +Synopsis: + +jump via table + +Mnemonic: + +cm.jt _index_ + +Encoding (RV32, RV64): + +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x2, attr: ['C2'] }, + { bits: 8, name: 'index', attr: [] }, + { bits: 3, name: 0x0, attr: [] }, + { bits: 3, name: 0x5, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +[NOTE] +==== +For this encoding to decode as _cm.jt_, _index<32_, otherwise it decodes as _cm.jalt_, see <>. +==== + +[NOTE] +==== +If JVT.mode = 0 (Jump Table Mode) then _cm.jt_ behaves as specified here. If JVT.mode is a reserved value, then _cm.jt_ is also reserved. In the future other defined values of JVT.mode may change the behaviour of _cm.jt_. +==== + +Assembly Syntax: + +[source,sail] +---- +cm.jt index +---- + +Description: + +_cm.jt_ reads an entry from the jump vector table in memory and jumps to the address that was read. + +For further information see <>. + +Prerequisites: + +None + +32-bit equivalent: + +No direct equivalent encoding exists. + +<<< + +[#insns-cm_jt-SAIL,reftext="cm.jt SAIL code"] +Operation: + +[source,sail] +---- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +# target_address is temporary internal state, it doesn't represent a real register +# InstMemory is byte indexed + +switch(XLEN) { + 32: table_address[XLEN-1:0] = JVT.base + (index<<2); + 64: table_address[XLEN-1:0] = JVT.base + (index<<3); +} + +//fetch from the jump table +target_address[XLEN-1:0] = InstMemory[table_address][XLEN-1:0]; + +j target_address[XLEN-1:0]&~0x1; + +---- + +<<< +[#insns-cm_jalt,reftext="Jump and link via table"] +==== cm.jalt + +Synopsis: + +jump via table with optional link + +Mnemonic: + +cm.jalt _index_ + +Encoding (RV32, RV64): + +[wavedrom, , svg] +.... +{reg:[ + { bits: 2, name: 0x2, attr: ['C2'] }, + { bits: 8, name: 'index', attr: [] }, + { bits: 3, name: 0x0, attr: [] }, + { bits: 3, name: 0x5, attr: ['FUNCT3'] }, +],config:{bits:16}} +.... + +[NOTE] +==== +For this encoding to decode as _cm.jalt_, _index>=32_, otherwise it decodes as _cm.jt_, see <>. +==== + +[NOTE] +==== +If JVT.mode = 0 (Jump Table Mode) then _cm.jalt_ behaves as specified here. If JVT.mode is a reserved value, then _cm.jalt_ is also reserved. In the future other defined values of JVT.mode may change the behaviour of _cm.jalt_. +==== + +Assembly Syntax: + +[source,sail] +---- +cm.jalt index +---- + +Description: + +_cm.jalt_ reads an entry from the jump vector table in memory and jumps to the address that was read, linking to _ra_. + +For further information see <>. + +Prerequisites: + +None + +32-bit equivalent: + +No direct equivalent encoding exists. + +<<< + +[#insns-cm_jalt-SAIL,reftext="cm.jalt SAIL code"] +Operation: + +[source,sail] +---- +//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. + +# target_address is temporary internal state, it doesn't represent a real register +# InstMemory is byte indexed + +switch(XLEN) { + 32: table_address[XLEN-1:0] = JVT.base + (index<<2); + 64: table_address[XLEN-1:0] = JVT.base + (index<<3); +} + +//fetch from the jump table +target_address[XLEN-1:0] = InstMemory[table_address][XLEN-1:0]; + +jal ra, target_address[XLEN-1:0]&~0x1; + +---- + + + diff --git a/src/zc/Zc.adoc b/src/zc/Zc.adoc deleted file mode 100644 index ee71a73..0000000 --- a/src/zc/Zc.adoc +++ /dev/null @@ -1,396 +0,0 @@ -//:sectnums: -//:version-label: v1.0.4-2 -//:lifecycle-state: ratified - -[#Zc] -== Zc* {version-label} - -=== Change history since v0.70.1 (tagged release) - -.Change history -[width="100%",options=header] -|==================================================================================== -|Version | change -|v1.0.4-3 | Added misa.C clarification -|v1.0.4-2 | Added rule that C implies Zca, Zcf, Zcd - discussed in https://github.com/riscv/riscv-isa-manual/issues/1132 -|v1.0.4-1 | Added rule that Zcf implies F and Zcd implies D - discussed in https://github.com/riscv/riscv-code-size-reduction/issues/221 - -|v1.0.4 | Resolve https://github.com/riscv/riscv-code-size-reduction/issues/221 - Zcf doesn't exist on RV64 as it contains no instructions -|v1.0.3-1 | Replace statement about non-idempotent memory handler completing the sequence (non-normative) -|v1.0.3 | Add definition of Zce -|v1.0.2 | Fix Architecture Review Committee feedback on instruction formats -|v1.0.1 | Post public review fixes: Add instruction formats (issue 192). Clarify that Zcmt/Zcmp are for embedded CPUs (issue 190). Fix some typos. -|v1.0.0-RC5.7| Add Zcb description and fix some typos. PUBLIC REVIEW REVISION. -|v1.0.0-RC5.6| Remove Zcmpe which is _not_ frozen and is causing confusion -|v1.0.0-RC5.5| Following ARC review Adjust the split so we have 224 cm.jalt and 32 cm.jt -|v1.0.0-RC5.4| Change wording for dependencies to match arch manual "Zxxx requires Zyyy" changed to "Zxxx depends on Zyyy" -|v1.0.0-RC5.3| Add dependency on Zicsr for Zcmt -|v1.0.0-RC5.2| Adjust the split so we have 240 cm.jalt and 16 cm.jt -|v1.0.0-RC5.1| Make cm.jt/cm.jalt only valid if JVT.mode=0, and allow different behaviour in the future if JVT.mode>0 -|v1.0.0-RC5| Revert to cm.jt and cm.jalt encodings, to avoid toolchain and trace problems -|v1.0.0-RC4.1| Resolve typographical issues with the document only, no actual changes -|v1.0.0-RC4| Release candidate -| | Remove Zcmb as benefit is low. Remove cm.jalt, read LSB of jump table entry to determine whether to link -|v0.70.5 | Resolve https://github.com/riscv/riscv-code-size-reduction/issues/163 - jvt.base is WARL and fewer bits than the max can be implemented -|v0.70.4 | Clarified https://github.com/riscv/riscv-code-size-reduction/issues/159 - Need Zbb and Zba for RV64 and M/ZMmul to get _all_ of Zcb -| | Resolved https://github.com/riscv/riscv-code-size-reduction/issues/161 -| | Resolved https://github.com/riscv/riscv-code-size-reduction/issues/160 - Allocated Smstateen bit 2 and added the relevant text -|v0.70.3 | Added rule that Zcf and Zcmt imply Zca (this text was missing, this is not a spec change: https://github.com/riscv/riscv-code-size-reduction/pull/151) -| | Added that Zcf is illegal for RV64, as it contains no instructions (clarification: https://github.com/riscv/riscv-code-size-reduction/issues/149) -| | Added push/pop examples in the push/pop section -|v0.70.2 | Stylistic changes only, removing redundant text. -| | Corrected field names on JVT CSR diagram, and fixed synopsis for cm.mvsa01 -|==================================================================================== - -=== Zc* Overview - -This document is in the ratified state. No changes are allowed. Any desired or needed changes can be the subject of a follow-on new extension. Ratified extensions are never revised. - -Zc* is a group of extensions which define subsets of the existing C extension (Zca, Zcd, Zcf) and new extensions which only contain 16-bit encodings. - -Zcm* all reuse the encodings for _c.fld_, _c.fsd_, _c.fldsp_, _c.fsdsp_. - -.Zc* extension overview -[width="100%",options=header,cols="3,1,1,1,1,1,1"] -|==================================================================================== -|Instruction |Zca |Zcf |Zcd |Zcb |Zcmp |Zcmt -7+|*The Zca extension is added as way to refer to instructions in the C extension that do not include the floating-point loads and stores* -|C excl. c.f* |yes | | | | | -7+|*The Zcf extension is added as a way to refer to compressed single-precision floating-point load/stores* -|c.flw | |rv32 | | | | -|c.flwsp | |rv32 | | | | -|c.fsw | |rv32 | | | | -|c.fswsp | |rv32 | | | | -7+|*The Zcd extension is added as a way to refer to compressed double-precision floating-point load/stores* -|c.fld | | |yes | | | -|c.fldsp | | |yes | | | -|c.fsd | | |yes | | | -|c.fsdsp | | |yes | | | -7+|*Simple operations for use on all architectures* -|c.lbu | | | |yes | | -|c.lh | | | |yes | | -|c.lhu | | | |yes | | -|c.sb | | | |yes | | -|c.sh | | | |yes | | -|c.zext.b | | | |yes | | -|c.sext.b | | | |yes | | -|c.zext.h | | | |yes | | -|c.sext.h | | | |yes | | -|c.zext.w | | | |yes | | -|c.mul | | | |yes | | -|c.not | | | |yes | | -7+|*PUSH/POP and double move which overlap with _c.fsdsp_. Complex operations intended for embedded CPUs* -|cm.push | | | | |yes | -|cm.pop | | | | |yes | -|cm.popret | | | | |yes | -|cm.popretz | | | | |yes | -|cm.mva01s | | | | |yes | -|cm.mvsa01 | | | | |yes | -7+|*Table jump which overlaps with _c.fsdsp_. Complex operations intended for embedded CPUs* -|cm.jt | | | | | |yes -|cm.jalt | | | | | |yes -|==================================================================================== - -[#C] -=== C - -The C extension is the superset of the following extensions: - -* Zca -* Zcf if F is specified (RV32 only) -* Zcd if D is specified - -As C defines the same instructions as Zca, Zcf and Zcd, the rule is that: - -* C always implies Zca -* C+F implies Zcf (RV32 only) -* C+D implies Zcd - -[#Zce] -=== Zce - -The Zce extension is intended to be used for microcontrollers, and includes all relevant Zc extensions. - -* Specifying Zce on RV32 without F includes Zca, Zcb, Zcmp, Zcmt -* Specifying Zce on RV32 with F includes Zca, Zcb, Zcmp, Zcmt _and_ Zcf -* Specifying Zce on RV64 always includes Zca, Zcb, Zcmp, Zcmt -** Zcf doesn't exist for RV64 - -Therefore common ISA strings can be updated as follows to include the relevant Zc extensions, for example: - -* RV32IMC becomes RV32IM_Zce -* RV32IMCF becomes RV32IMF_Zce - -[#misaC] -=== MISA.C - -MISA.C is set if the following extensions are selected: - -* Zca and not F -* Zca, Zcf and F is specified (RV32 only) -* Zca, Zcf and Zcd if D is specified (RV32 only) -** this configuration excludes Zcmp, Zcmt -* Zca, Zcd if D is specified (RV64 only) -** this configuration excludes Zcmp, Zcmt - -[#Zca,Zca] -=== Zca - -The Zca extension is added as way to refer to instructions in the C extension that do not include the floating-point loads and stores. - -Therefore it _excluded_ all 16-bit floating point loads and stores: _c.flw_, _c.flwsp_, _c.fsw_, _c.fswsp_, _c.fld_, _c.fldsp_, _c.fsd_, _c.fsdsp_. - -[NOTE] -==== -the C extension only includes F/D instructions when D and F are also specified -==== - -[#Zcf] -=== Zcf (RV32 only) - -Zcf is the existing set of compressed single precision floating point loads and stores: _c.flw_, _c.flwsp_, _c.fsw_, _c.fswsp_. - -Zcf is only relevant to RV32, it cannot be specified for RV64. - -The Zcf extension depends on the <> and F extensions. - -[#Zcd] -=== Zcd - -Zcd is the existing set of compressed double precision floating point loads and stores: _c.fld_, _c.fldsp_, _c.fsd_, _c.fsdsp_. - -The Zcd extension depends on the <> and D extensions. - -[#Zcb] -=== Zcb - -Zcb has simple code-size saving instructions which are easy to implement on all CPUs. - -All proposed encodings are currently reserved for all architectures, and have no conflicts with any existing extensions. - -NOTE: Zcb can be implemented on _any_ CPU as the instructions are 16-bit versions of existing 32-bit instructions from the application class profile. - -The Zcb extension depends on the <> extension. - -As shown on the individual instruction pages, many of the instructions in Zcb depend upon another extension being implemented. For example, _c.mul_ is only implemented if M or Zmmul is implemented, and _c.sext.b_ is only implemented if Zbb is implemented. - -The _c.mul_ encoding uses the CA register format along with other instructions such as _c.sub_, _c.xor_ etc. - -[NOTE] - - _c.sext.w_ is a pseudo-instruction for _c.addiw rd, 0_ (RV64) - -[%header,cols="^1,^1,4,8"] -|=== -|RV32 -|RV64 -|Mnemonic -|Instruction - -|yes -|yes -|c.lbu _rd'_, uimm(_rs1'_) -|<<#insns-c_lbu>> - -|yes -|yes -|c.lhu _rd'_, uimm(_rs1'_) -|<<#insns-c_lhu>> - -|yes -|yes -|c.lh _rd'_, uimm(_rs1'_) -|<<#insns-c_lh>> - -|yes -|yes -|c.sb _rs2'_, uimm(_rs1'_) -|<<#insns-c_sb>> - -|yes -|yes -|c.sh _rs2'_, uimm(_rs1'_) -|<<#insns-c_sh>> - -|yes -|yes -|c.zext.b _rsd'_ -|<<#insns-c_zext_b>> - -|yes -|yes -|c.sext.b _rsd'_ -|<<#insns-c_sext_b>> - -|yes -|yes -|c.zext.h _rsd'_ -|<<#insns-c_zext_h>> - -|yes -|yes -|c.sext.h _rsd'_ -|<<#insns-c_sext_h>> - -| -|yes -|c.zext.w _rsd'_ -|<<#insns-c_zext_w>> - -|yes -|yes -|c.not _rsd'_ -|<<#insns-c_not>> - -|yes -|yes -|c.mul _rsd'_, _rs2'_ -|<<#insns-c_mul>> - -|=== - -<<< - -[#Zcmp] -=== Zcmp - -The Zcmp extension is a set of instructions which may be executed as a series of existing 32-bit RISC-V instructions. - -This extension reuses some encodings from _c.fsdsp_. Therefore it is _incompatible_ with <>, - which is included when C and D extensions are both present. - -NOTE: Zcmp is primarily targeted at embedded class CPUs due to implementation complexity. Additionally, it is not compatible with architecture class profiles. - -The Zcmp extension depends on the <> extension. - -The PUSH/POP assembly syntax uses several variables, the meaning of which are: - -* _reg_list_ is a list containing 1 to 13 registers (ra and 0 to 12 s registers) -** valid values: {ra}, {ra, s0}, {ra, s0-s1}, {ra, s0-s2}, ..., {ra, s0-s8}, {ra, s0-s9}, {ra, s0-s11} -** note that {ra, s0-s10} is _not_ valid, giving 12 lists not 13 for better encoding -* _stack_adj_ is the total size of the stack frame. -** valid values vary with register list length and the specific encoding, see the instruction pages for details. - -[%header,cols="^1,^1,4,8"] -|=== -|RV32 -|RV64 -|Mnemonic -|Instruction - -|yes -|yes -|cm.push _{reg_list}, -stack_adj_ -|<<#insns-cm_push>> - -|yes -|yes -|cm.pop _{reg_list}, stack_adj_ -|<<#insns-cm_pop>> - -|yes -|yes -|cm.popret _{reg_list}, stack_adj_ -|<<#insns-cm_popret>> - -|yes -|yes -|cm.popretz _{reg_list}, stack_adj_ -|<<#insns-cm_popretz>> - -|yes -|yes -|cm.mva01s _rs1', rs2'_ -|<<#insns-cm_mva01s>> - -|yes -|yes -|cm.mvsa01 _r1s', r2s'_ -|<<#insns-cm_mvsa01>> - -|=== - -<<< - -[#Zcmt] -=== Zcmt - -Zcmt adds the table jump instructions and also adds the JVT CSR. The JVT CSR requires a -state enable if Smstateen is implemented. See <> for details. - -This extension reuses some encodings from _c.fsdsp_. Therefore it is _incompatible_ with <>, - which is included when C and D extensions are both present. - -NOTE: Zcmt is primarily targeted at embedded class CPUs due to implementation complexity. Additionally, it is not compatible with architecture class profiles. - -The Zcmt extension depends on the <> and Zicsr extensions. - -[%header,cols="^1,^1,4,8"] -|=== -|RV32 -|RV64 -|Mnemonic -|Instruction - -|yes -|yes -|cm.jt _index_ -|<<#insns-cm_jt>> - -|yes -|yes -|cm.jalt _index_ -|<<#insns-cm_jalt>> - -|=== - -[#Zc_formats] -=== Zc instruction formats - -Several instructions in this specification use the following new instruction formats. - -[%header,cols="2,3,2,1,1,1,1,1,1,1,1,1,1"] -|===================================================================== -| Format | instructions | 15:10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 -| CLB | c.lbu | funct6 3+| rs1' 2+| uimm 3+| rd' 2+| op -| CSB | c.sb | funct6 3+| rs1' 2+| uimm 3+| rs2' 2+| op -| CLH | c.lhu, c.lh | funct6 3+| rs1' | funct1 | uimm 3+| rd' 2+| op -| CSH | c.sh | funct6 3+| rs1' | funct1 | uimm 3+| rs2' 2+| op -| CU | c.[sz]ext.*, c.not | funct6 3+| rd'/rs1' 5+| funct5 2+| op -| CMMV | cm.mvsa01 cm.mva01s| funct6 3+| r1s' 2+| funct2 3+| r2s' 2+| op -| CMJT | cm.jt cm.jalt | funct6 8+| index 2+| op -| CMPP | cm.push*, cm.pop* | funct6 2+| funct2 4+| urlist 2+| spimm 2+| op -|===================================================================== - -NOTE: c.mul uses the existing CA format - -[#Zcb_instructions] -=== Zcb instructions - -include::c_lbu.adoc[] -include::c_lhu.adoc[] -include::c_lh.adoc[] -include::c_sb.adoc[] -include::c_sh.adoc[] - -include::c_zext_b.adoc[] -include::c_sext_b.adoc[] -include::c_zext_h.adoc[] -include::c_sext_h.adoc[] -include::c_zext_w.adoc[] -include::c_not.adoc[] -include::c_mul.adoc[] - -include::pushpop.adoc[] -include::cm_push.adoc[] -include::cm_pop.adoc[] -include::cm_popretz.adoc[] -include::cm_popret.adoc[] -include::cm_mvsa01.adoc[] -include::cm_mva01s.adoc[] - -include::tablejump.adoc[] -include::jvt_csr.adoc[] -include::cm_jt.adoc[] -include::cm_jalt.adoc[] - -- cgit v1.1 From 146c3738a7a2866a6d8143dfbc2764ac8e7117f3 Mon Sep 17 00:00:00 2001 From: wmat Date: Thu, 22 Feb 2024 10:43:08 -0500 Subject: Removed original Zc chapter folder. Removing Zc chapter folder as all content is now in 1 file, zc.adoc. --- src/zc/.gitignore | 4 - src/zc/Zcb_footer.adoc | 12 - src/zc/Zcf_footer.adoc | 12 - src/zc/Zcmb_footer.adoc | 12 - src/zc/Zcmd.adoc | 22 - src/zc/Zcmd.pdf | 2387 --------------------------- src/zc/Zcmd_footer.adoc | 12 - src/zc/Zcmp_footer.adoc | 12 - src/zc/Zcmpe_footer.adoc | 12 - src/zc/Zcmt_footer.adoc | 12 - src/zc/c_lbsb_imm_offset.adoc | 8 - src/zc/c_lbu.adoc | 48 - src/zc/c_lh.adoc | 50 - src/zc/c_lhsh_imm_offset.adoc | 8 - src/zc/c_lhu.adoc | 50 - src/zc/c_mul.adoc | 50 - src/zc/c_not.adoc | 52 - src/zc/c_sb.adoc | 48 - src/zc/c_sext_b.adoc | 50 - src/zc/c_sext_h.adoc | 51 - src/zc/c_sh.adoc | 50 - src/zc/c_zca_required.adoc | Bin 60 -> 0 bytes src/zc/c_zext_b.adoc | 55 - src/zc/c_zext_h.adoc | 51 - src/zc/c_zext_w.adoc | 53 - src/zc/changes_since_v0.50.adoc | 130 -- src/zc/cm_decbnez.adoc | 51 - src/zc/cm_jalt.adoc | 74 - src/zc/cm_jt.adoc | 74 - src/zc/cm_lb.adoc | 49 - src/zc/cm_lbsb_imm_offset.adoc | 9 - src/zc/cm_lbu.adoc | 52 - src/zc/cm_lh.adoc | 53 - src/zc/cm_lhsh_imm_offset.adoc | 9 - src/zc/cm_lhu.adoc | 55 - src/zc/cm_mva01s.adoc | 63 - src/zc/cm_mvsa01.adoc | 68 - src/zc/cm_pop.adoc | 49 - src/zc/cm_pop_popret_loads_pseudo_code.adoc | 25 - src/zc/cm_pop_pseudo_code.adoc | 7 - src/zc/cm_popret.adoc | 50 - src/zc/cm_popret_pseudo_code.adoc | 9 - src/zc/cm_popretz.adoc | 49 - src/zc/cm_popretz_pseudo_code.adoc | 14 - src/zc/cm_push.adoc | 49 - src/zc/cm_push_pseudo_code.adoc | 7 - src/zc/cm_push_stores_pseudo_code.adoc | 25 - src/zc/cm_sb.adoc | 54 - src/zc/cm_sh.adoc | 55 - src/zc/example.bib | 40 - src/zc/jvt_csr.adoc | 68 - src/zc/pushpop.adoc | 354 ---- src/zc/pushpop_extra_info.adoc | 23 - src/zc/pushpop_vars.adoc | 91 - src/zc/readme.md | 15 - src/zc/tablejump.adoc | 49 - src/zc/variable_def.adoc | 1 - 57 files changed, 4842 deletions(-) delete mode 100644 src/zc/.gitignore delete mode 100644 src/zc/Zcb_footer.adoc delete mode 100644 src/zc/Zcf_footer.adoc delete mode 100644 src/zc/Zcmb_footer.adoc delete mode 100644 src/zc/Zcmd.adoc delete mode 100644 src/zc/Zcmd.pdf delete mode 100644 src/zc/Zcmd_footer.adoc delete mode 100644 src/zc/Zcmp_footer.adoc delete mode 100644 src/zc/Zcmpe_footer.adoc delete mode 100644 src/zc/Zcmt_footer.adoc delete mode 100644 src/zc/c_lbsb_imm_offset.adoc delete mode 100644 src/zc/c_lbu.adoc delete mode 100644 src/zc/c_lh.adoc delete mode 100644 src/zc/c_lhsh_imm_offset.adoc delete mode 100644 src/zc/c_lhu.adoc delete mode 100644 src/zc/c_mul.adoc delete mode 100644 src/zc/c_not.adoc delete mode 100644 src/zc/c_sb.adoc delete mode 100644 src/zc/c_sext_b.adoc delete mode 100644 src/zc/c_sext_h.adoc delete mode 100644 src/zc/c_sh.adoc delete mode 100644 src/zc/c_zca_required.adoc delete mode 100644 src/zc/c_zext_b.adoc delete mode 100644 src/zc/c_zext_h.adoc delete mode 100644 src/zc/c_zext_w.adoc delete mode 100644 src/zc/changes_since_v0.50.adoc delete mode 100644 src/zc/cm_decbnez.adoc delete mode 100644 src/zc/cm_jalt.adoc delete mode 100644 src/zc/cm_jt.adoc delete mode 100644 src/zc/cm_lb.adoc delete mode 100644 src/zc/cm_lbsb_imm_offset.adoc delete mode 100644 src/zc/cm_lbu.adoc delete mode 100644 src/zc/cm_lh.adoc delete mode 100644 src/zc/cm_lhsh_imm_offset.adoc delete mode 100644 src/zc/cm_lhu.adoc delete mode 100644 src/zc/cm_mva01s.adoc delete mode 100644 src/zc/cm_mvsa01.adoc delete mode 100644 src/zc/cm_pop.adoc delete mode 100644 src/zc/cm_pop_popret_loads_pseudo_code.adoc delete mode 100644 src/zc/cm_pop_pseudo_code.adoc delete mode 100644 src/zc/cm_popret.adoc delete mode 100644 src/zc/cm_popret_pseudo_code.adoc delete mode 100644 src/zc/cm_popretz.adoc delete mode 100644 src/zc/cm_popretz_pseudo_code.adoc delete mode 100644 src/zc/cm_push.adoc delete mode 100644 src/zc/cm_push_pseudo_code.adoc delete mode 100644 src/zc/cm_push_stores_pseudo_code.adoc delete mode 100644 src/zc/cm_sb.adoc delete mode 100644 src/zc/cm_sh.adoc delete mode 100644 src/zc/example.bib delete mode 100644 src/zc/jvt_csr.adoc delete mode 100644 src/zc/pushpop.adoc delete mode 100644 src/zc/pushpop_extra_info.adoc delete mode 100644 src/zc/pushpop_vars.adoc delete mode 100644 src/zc/readme.md delete mode 100644 src/zc/tablejump.adoc delete mode 100644 src/zc/variable_def.adoc diff --git a/src/zc/.gitignore b/src/zc/.gitignore deleted file mode 100644 index feddacc..0000000 --- a/src/zc/.gitignore +++ /dev/null @@ -1,4 +0,0 @@ -*.svg -.asciidoctor/ - - diff --git a/src/zc/Zcb_footer.adoc b/src/zc/Zcb_footer.adoc deleted file mode 100644 index 1c8122d..0000000 --- a/src/zc/Zcb_footer.adoc +++ /dev/null @@ -1,12 +0,0 @@ - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zcb (<>) -|{version-label} -|{lifecycle-state} -|=== diff --git a/src/zc/Zcf_footer.adoc b/src/zc/Zcf_footer.adoc deleted file mode 100644 index 62f336a..0000000 --- a/src/zc/Zcf_footer.adoc +++ /dev/null @@ -1,12 +0,0 @@ - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zcf (<>) -|{version-label} -|{lifecycle-state} -|=== diff --git a/src/zc/Zcmb_footer.adoc b/src/zc/Zcmb_footer.adoc deleted file mode 100644 index ac73f23..0000000 --- a/src/zc/Zcmb_footer.adoc +++ /dev/null @@ -1,12 +0,0 @@ - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zcmb (<>) -|v0.70.5 -|{lifecycle-state} -|=== diff --git a/src/zc/Zcmd.adoc b/src/zc/Zcmd.adoc deleted file mode 100644 index a5ff18e..0000000 --- a/src/zc/Zcmd.adoc +++ /dev/null @@ -1,22 +0,0 @@ -[#Zcmd] -==== Zcmd v0.1 - -This document is in the Development state. Assume everything can change. For more information see: -https://riscv.org/spec-state - -[%header,cols="^1,^1,4,8"] -|=== -|RV32 -|RV64 -|Mnemonic -|Instruction - -|✓ -|✓ -|cm.decbnez t0, imm -|<<#insns-cm_decbnez>> - -|=== - -include::cm_decbnez.adoc[] - diff --git a/src/zc/Zcmd.pdf b/src/zc/Zcmd.pdf deleted file mode 100644 index 9035979..0000000 --- a/src/zc/Zcmd.pdf +++ /dev/null @@ -1,2387 +0,0 @@ -%PDF-1.4 -%ÿÿÿÿ -1 0 obj -<< /Title (Untitled) -/Creator (Asciidoctor PDF 1.6.0, based on Prawn 2.4.0) -/Producer (Asciidoctor PDF 1.6.0, based on Prawn 2.4.0) -/ModDate (D:20220121110536+00'00') -/CreationDate (D:20220121110919+00'00') ->> -endobj -2 0 obj -<< /Type /Catalog -/Pages 3 0 R -/Names 9 0 R -/Outlines 24 0 R -/PageLabels 28 0 R -/PageMode /UseOutlines -/OpenAction [7 0 R /FitH 841.89] -/ViewerPreferences << /DisplayDocTitle true ->> ->> -endobj -3 0 obj -<< /Type /Pages -/Count 2 -/Kids [7 0 R 18 0 R] ->> -endobj -4 0 obj -<< /Length 2 ->> -stream -q - -endstream -endobj -5 0 obj -<< /Type /Page -/Parent 3 0 R -/MediaBox [0 0 595.28 841.89] -/CropBox [0 0 595.28 841.89] -/BleedBox [0 0 595.28 841.89] -/TrimBox [0 0 595.28 841.89] -/ArtBox [0 0 595.28 841.89] -/Contents 4 0 R -/Resources << /ProcSet [/PDF /Text /ImageB /ImageC /ImageI] ->> ->> -endobj -6 0 obj -<< /Length 5286 ->> -stream -q -/DeviceRGB cs -0.2431 0.0196 0.5569 scn -/DeviceRGB CS -0.2431 0.0196 0.5569 SCN - -BT -48.24 792.89 Td -/F1.0 13 Tf -<5a636d642076302e31> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -3.4131 Tw - -BT -48.24 768.24 Td -/F1.0 10.5 Tf -<5468697320646f63756d656e7420697320696e2074686520446576656c6f706d656e742073746174652e20417373756d652065766572797468696e672063616e206368616e67652e20466f72206d6f726520696e666f726d6174696f6e207365653a> Tj -ET - - -0.0 Tw -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2588 0.5451 0.7922 scn -0.2588 0.5451 0.7922 SCN - -BT -48.24 754.14 Td -/F1.0 10.5 Tf -<68747470733a2f2f72697363762e6f72672f737065632d7374617465> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -1.0 1.0 1.0 scn -48.24 717.09 35.6283 22.2 re -f -0.0 0.0 0.0 scn -1.0 1.0 1.0 scn -83.8683 717.09 35.6283 22.2 re -f -0.0 0.0 0.0 scn -1.0 1.0 1.0 scn -119.4966 717.09 142.5141 22.2 re -f -0.0 0.0 0.0 scn -1.0 1.0 1.0 scn -262.0107 717.09 285.0293 22.2 re -f -0.0 0.0 0.0 scn -1.0 1.0 1.0 scn -48.24 694.89 35.6283 22.2 re -f -0.0 0.0 0.0 scn -1.0 1.0 1.0 scn -83.8683 694.89 35.6283 22.2 re -f -0.0 0.0 0.0 scn -1.0 1.0 1.0 scn -119.4966 694.89 142.5141 22.2 re -f -0.0 0.0 0.0 scn -1.0 1.0 1.0 scn -262.0107 694.89 285.0293 22.2 re -f -0.0 0.0 0.0 scn -0.2 w -0.8667 0.8667 0.8667 SCN -48.24 739.29 m -83.8683 739.29 l -S -[] 0 d -1.25 w -0.8667 0.8667 0.8667 SCN -48.24 717.09 m -83.8683 717.09 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -48.24 739.39 m -48.24 716.465 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -83.8683 739.39 m -83.8683 716.465 l -S -[] 0 d -1 w -0.0 0.0 0.0 SCN -0.2196 0.2196 0.2196 scn - -BT -53.2401 725.04 Td -/F2.0 10.5 Tf -<52563332> Tj -ET - -0.0 0.0 0.0 scn -0.2 w -0.8667 0.8667 0.8667 SCN -83.8683 739.29 m -119.4966 739.29 l -S -[] 0 d -1.25 w -0.8667 0.8667 0.8667 SCN -83.8683 717.09 m -119.4966 717.09 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -83.8683 739.39 m -83.8683 716.465 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -119.4966 739.39 m -119.4966 716.465 l -S -[] 0 d -1 w -0.0 0.0 0.0 SCN -0.2196 0.2196 0.2196 scn - -BT -88.8684 725.04 Td -/F2.0 10.5 Tf -<52563634> Tj -ET - -0.0 0.0 0.0 scn -0.2 w -0.8667 0.8667 0.8667 SCN -119.4966 739.29 m -262.0107 739.29 l -S -[] 0 d -1.25 w -0.8667 0.8667 0.8667 SCN -119.4966 717.09 m -262.0107 717.09 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -119.4966 739.39 m -119.4966 716.465 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -262.0107 739.39 m -262.0107 716.465 l -S -[] 0 d -1 w -0.0 0.0 0.0 SCN -0.2196 0.2196 0.2196 scn - -BT -122.4966 725.04 Td -/F2.0 10.5 Tf -<4d6e656d6f6e6963> Tj -ET - -0.0 0.0 0.0 scn -0.2 w -0.8667 0.8667 0.8667 SCN -262.0107 739.29 m -547.04 739.29 l -S -[] 0 d -1.25 w -0.8667 0.8667 0.8667 SCN -262.0107 717.09 m -547.04 717.09 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -262.0107 739.39 m -262.0107 716.465 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -547.04 739.39 m -547.04 716.465 l -S -[] 0 d -1 w -0.0 0.0 0.0 SCN -0.2196 0.2196 0.2196 scn - -BT -265.0107 725.04 Td -/F2.0 10.5 Tf -<496e737472756374696f6e> Tj -ET - -0.0 0.0 0.0 scn -1.25 w -0.8667 0.8667 0.8667 SCN -48.24 717.09 m -83.8683 717.09 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -48.24 694.89 m -83.8683 694.89 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -48.24 717.715 m -48.24 694.79 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -83.8683 717.715 m -83.8683 694.79 l -S -[] 0 d -1 w -0.0 0.0 0.0 SCN -0.2196 0.2196 0.2196 scn - -BT -61.3041 702.21 Td -/F3.1 10.5 Tf -<21> Tj -ET - -0.0 0.0 0.0 scn -1.25 w -0.8667 0.8667 0.8667 SCN -83.8683 717.09 m -119.4966 717.09 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -83.8683 694.89 m -119.4966 694.89 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -83.8683 717.715 m -83.8683 694.79 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -119.4966 717.715 m -119.4966 694.79 l -S -[] 0 d -1 w -0.0 0.0 0.0 SCN -0.2196 0.2196 0.2196 scn - -BT -96.9324 702.21 Td -/F3.1 10.5 Tf -<21> Tj -ET - -0.0 0.0 0.0 scn -1.25 w -0.8667 0.8667 0.8667 SCN -119.4966 717.09 m -262.0107 717.09 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -119.4966 694.89 m -262.0107 694.89 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -119.4966 717.715 m -119.4966 694.79 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -262.0107 717.715 m -262.0107 694.79 l -S -[] 0 d -1 w -0.0 0.0 0.0 SCN -0.2196 0.2196 0.2196 scn - -BT -122.4966 702.84 Td -/F1.0 10.5 Tf -<636d2e646563626e657a2074302c20696d6d> Tj -ET - -0.0 0.0 0.0 scn -1.25 w -0.8667 0.8667 0.8667 SCN -262.0107 717.09 m -547.04 717.09 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -262.0107 694.89 m -547.04 694.89 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -262.0107 717.715 m -262.0107 694.79 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -547.04 717.715 m -547.04 694.79 l -S -[] 0 d -1 w -0.0 0.0 0.0 SCN -0.2196 0.2196 0.2196 scn -0.2588 0.5451 0.7922 scn -0.2588 0.5451 0.7922 SCN - -BT -265.0107 702.84 Td -/F1.0 10.5 Tf -<636d2e646563626e657a3a2044656372656d656e7420616e64206272616e63682c2031362d62697420656e636f64696e67> Tj -ET - -0.0 0.0 0.0 SCN -0.2196 0.2196 0.2196 scn -0.0 0.0 0.0 scn -q -0.0 0.0 0.0 scn -0.0 0.0 0.0 SCN -1 w -0 J -0 j -[] 0 d -/Stamp1 Do -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -BT -514.477 825.4592 Td -/F1.0 9 Tf -<7c20506167652031> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -Q -q -0.0 0.0 0.0 scn -0.0 0.0 0.0 SCN -1 w -0 J -0 j -[] 0 d -/Stamp3 Do -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -BT -456.544 16.675 Td -/F1.0 9 Tf -<5a636d642076302e31207c20a920524953432d56> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -Q -Q - -endstream -endobj -7 0 obj -<< /Type /Page -/Parent 3 0 R -/MediaBox [0 0 595.28 841.89] -/CropBox [0 0 595.28 841.89] -/BleedBox [0 0 595.28 841.89] -/TrimBox [0 0 595.28 841.89] -/ArtBox [0 0 595.28 841.89] -/Contents 6 0 R -/Resources << /ProcSet [/PDF /Text /ImageB /ImageC /ImageI] -/Font << /F1.0 12 0 R -/F2.0 14 0 R -/F3.1 15 0 R ->> -/XObject << /Stamp1 29 0 R -/Stamp3 31 0 R ->> ->> -/Annots [13 0 R 16 0 R] ->> -endobj -8 0 obj -[7 0 R /XYZ 0 841.89 null] -endobj -9 0 obj -<< /Type /Names -/Dests 10 0 R ->> -endobj -10 0 obj -<< /Names [(Zcmd) 11 0 R (__anchor-top) 8 0 R (insns-cm_decbnez) 19 0 R] ->> -endobj -11 0 obj -[7 0 R /XYZ 0 841.89 null] -endobj -12 0 obj -<< /Type /Font -/BaseFont /b0705e+CMUSansSerif -/Subtype /TrueType -/FontDescriptor 34 0 R -/FirstChar 32 -/LastChar 255 -/Widths 36 0 R -/ToUnicode 35 0 R ->> -endobj -13 0 obj -<< /Border [0 0 0] -/A << /Type /Action -/S /URI -/URI (https://riscv.org/spec-state) ->> -/Subtype /Link -/Rect [48.24 752.04 169.578 762.54] -/Type /Annot ->> -endobj -14 0 obj -<< /Type /Font -/BaseFont /e1b069+CMUSansSerif-Bold -/Subtype /TrueType -/FontDescriptor 38 0 R -/FirstChar 32 -/LastChar 255 -/Widths 40 0 R -/ToUnicode 39 0 R ->> -endobj -15 0 obj -<< /Type /Font -/BaseFont /28d5ce+mplus-1p-regular -/Subtype /TrueType -/FontDescriptor 42 0 R -/FirstChar 32 -/LastChar 255 -/Widths 44 0 R -/ToUnicode 43 0 R ->> -endobj -16 0 obj -<< /Border [0 0 0] -/Dest (insns-cm_decbnez) -/Subtype /Link -/Rect [265.0107 700.74 496.2837 711.24] -/Type /Annot ->> -endobj -17 0 obj -<< /Length 13824 ->> -stream -q -/DeviceRGB cs -0.2431 0.0196 0.5569 scn -/DeviceRGB CS -0.2431 0.0196 0.5569 SCN - -BT -48.24 787.89 Td -/F1.0 18 Tf -<636d2e646563626e657a3a205468697320697320696e2074686520> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2431 0.0196 0.5569 scn -0.2431 0.0196 0.5569 SCN - -BT -251.568 787.89 Td -/F4.0 18 Tf -<646576656c6f706d656e74> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2431 0.0196 0.5569 scn -0.2431 0.0196 0.5569 SCN - -BT -345.852 787.89 Td -/F1.0 18 Tf -<2070686173652c20666f722062656e63686d61726b696e67> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2431 0.0196 0.5569 scn -0.2431 0.0196 0.5569 SCN - -BT -48.24 766.29 Td -/F1.0 18 Tf -<616e642070726f746f747970696e67206f6e6c79> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -BT -48.24 740.64 Td -/F2.0 10.5 Tf -<53796e6f70736973> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -BT -63.24 723.54 Td -/F1.0 10.5 Tf -<44656372656d656e7420616e64206272616e63682c2031362d62697420656e636f64696e67> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -BT -48.24 697.44 Td -/F2.0 10.5 Tf -<4d6e656d6f6e6963> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -BT -63.24 680.34 Td -/F1.0 10.5 Tf -<636d2e646563626e657a20> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -BT -117.4935 680.34 Td -/F4.0 10.5 Tf -<7430> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -BT -126.534 680.34 Td -/F1.0 10.5 Tf -<2c20> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -BT -132.9495 680.34 Td -/F4.0 10.5 Tf -<6f6666736574> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -BT -48.24 654.24 Td -/F2.0 10.5 Tf -<456e636f64696e672028525633322c205256363429> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -q -48.24 604.745 m -547.04 604.745 l -547.04 648.39 l -48.24 648.39 l -h -W n -0.0 0.0 0.0 scn -0.6235 0.0 0.0 0.6235 18.16236 244.11884 cm -1.0 0.0 0.0 1.0 0.0 0.0 cm -q -q -1.0 0.0 0.0 1.0 0.5 -0.5 cm -q -q -1.0 0.0 0.0 1.0 4.0 -21.0 cm -q -0.0 0.0 0.0 SCN -1.0 w -1 J -q -48.24 648.39 m -839.24 648.39 l -S -Q -q -48.24 648.39 m -48.24 617.39 l -S -Q -q -48.24 617.39 m -839.24 617.39 l -S -Q -q -839.24 648.39 m -839.24 617.39 l -S -Q -q -790.24 648.39 m -790.24 645.39 l -S -Q -q -790.24 617.39 m -790.24 620.39 l -S -Q -q -740.24 648.39 m -740.24 617.39 l -S -Q -q -691.24 648.39 m -691.24 645.39 l -S -Q -q -691.24 617.39 m -691.24 620.39 l -S -Q -q -641.24 648.39 m -641.24 645.39 l -S -Q -q -641.24 617.39 m -641.24 620.39 l -S -Q -q -592.24 648.39 m -592.24 645.39 l -S -Q -q -592.24 617.39 m -592.24 620.39 l -S -Q -q -542.24 648.39 m -542.24 645.39 l -S -Q -q -542.24 617.39 m -542.24 620.39 l -S -Q -q -493.24 648.39 m -493.24 645.39 l -S -Q -q -493.24 617.39 m -493.24 620.39 l -S -Q -q -444.24 648.39 m -444.24 617.39 l -S -Q -q -394.24 648.39 m -394.24 617.39 l -S -Q -q -345.24 648.39 m -345.24 645.39 l -S -Q -q -345.24 617.39 m -345.24 620.39 l -S -Q -q -295.24 648.39 m -295.24 645.39 l -S -Q -q -295.24 617.39 m -295.24 620.39 l -S -Q -q -246.24 648.39 m -246.24 617.39 l -S -Q -q -196.24 648.39 m -196.24 617.39 l -S -Q -q -147.24 648.39 m -147.24 645.39 l -S -Q -q -147.24 617.39 m -147.24 620.39 l -S -Q -q -97.24 648.39 m -97.24 645.39 l -S -Q -q -97.24 617.39 m -97.24 620.39 l -S -Q -Q -q -q -Q -q -q -1.0 0.0 0.0 1.0 25.0 11.0 cm -q -q -1.0 0.0 0.0 1.0 742.0 0.0 cm -q - -BT -44.348 642.39 Td -/F5.0 14 Tf -[<30>] TJ -ET - -Q -Q -Q -q -q -1.0 0.0 0.0 1.0 692.0 0.0 cm -q - -BT -44.348 642.39 Td -/F5.0 14 Tf -[<31>] TJ -ET - -Q -Q -Q -q -q -1.0 0.0 0.0 1.0 643.0 0.0 cm -q - -BT -44.348 642.39 Td -/F5.0 14 Tf -[<32>] TJ -ET - -Q -Q -Q -q -q -1.0 0.0 0.0 1.0 396.0 0.0 cm -q - -BT -44.348 642.39 Td -/F5.0 14 Tf -[<37>] TJ -ET - -Q -Q -Q -q -q -1.0 0.0 0.0 1.0 346.0 0.0 cm -q - -BT -44.348 642.39 Td -/F5.0 14 Tf -[<38>] TJ -ET - -Q -Q -Q -q -q -1.0 0.0 0.0 1.0 297.0 0.0 cm -q - -BT -44.348 642.39 Td -/F5.0 14 Tf -[<39>] TJ -ET - -Q -Q -Q -q -q -1.0 0.0 0.0 1.0 198.0 0.0 cm -q - -BT -40.456 642.39 Td -/F5.0 14 Tf -[<3131>] TJ -ET - -Q -Q -Q -q -q -1.0 0.0 0.0 1.0 148.0 0.0 cm -q - -BT -40.456 642.39 Td -/F5.0 14 Tf -[<3132>] TJ -ET - -Q -Q -Q -q -q -1.0 0.0 0.0 1.0 99.0 0.0 cm -q - -BT -40.456 642.39 Td -/F5.0 14 Tf -[<3133>] TJ -ET - -Q -Q -Q -q -q - -BT -40.456 642.39 Td -/F5.0 14 Tf -[<3135>] TJ -ET - -Q -Q -Q -Q -q -q -1.0 0.0 0.0 1.0 25.0 -15.0 cm -q -q -q -1.0 0.0 0.0 1.0 742.0 0.0 cm -q - -BT -44.348 642.39 Td -/F5.0 14 Tf -[<30>] TJ -ET - -Q -Q -Q -q -q -1.0 0.0 0.0 1.0 692.0 0.0 cm -q - -BT -44.348 642.39 Td -/F5.0 14 Tf -[<31>] TJ -ET - -Q -Q -Q -Q -q -q -1.0 0.0 0.0 1.0 519.0 0.0 cm -q -q - -BT -4.266 642.39 Td -/F5.0 14 Tf -[<696d6d5b367c377c333a317c355d>] TJ -ET - -Q -Q -Q -Q -q -q -q -1.0 0.0 0.0 1.0 346.0 0.0 cm -q - -BT -44.348 642.39 Td -/F5.0 14 Tf -[<31>] TJ -ET - -Q -Q -Q -Q -q -q -1.0 0.0 0.0 1.0 247.0 0.0 cm -q -q - -BT -15.69 642.39 Td -/F5.0 14 Tf -[<696d6d5b347c393a385d>] TJ -ET - -Q -Q -Q -Q -q -q -q -1.0 0.0 0.0 1.0 148.0 0.0 cm -q - -BT -44.348 642.39 Td -/F5.0 14 Tf -[<31>] TJ -ET - -Q -Q -Q -Q -q -q -q -1.0 0.0 0.0 1.0 99.0 0.0 cm -q - -BT -44.348 642.39 Td -/F5.0 14 Tf -[<31>] TJ -ET - -Q -Q -Q -q -q -1.0 0.0 0.0 1.0 49.0 0.0 cm -q - -BT -44.348 642.39 Td -/F5.0 14 Tf -[<30>] TJ -ET - -Q -Q -Q -q -q - -BT -44.348 642.39 Td -/F5.0 14 Tf -[<31>] TJ -ET - -Q -Q -Q -Q -Q -q -q -1.0 0.0 0.0 1.0 25.0 -39.0 cm -q -q -q -1.0 0.0 0.0 1.0 717.0 0.0 cm -q -q - -BT -39.294 642.39 Td -/F5.0 14 Tf -[<4332>] TJ -ET - -Q -Q -Q -Q -Q -q -Q -q -Q -q -Q -q -Q -q -q -q -1.0 0.0 0.0 1.0 49.0 0.0 cm -q -q - -BT -20.632 642.39 Td -/F5.0 14 Tf -[<46554e435433>] TJ -ET - -Q -Q -Q -Q -Q -Q -Q -Q -Q -Q -Q -Q -Q -q -0.2 w -0.9333 0.9333 0.9333 SCN -103.3515 592.745 m -103.3515 570.645 l -S -Q -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -BT -60.24 577.495 Td -/F2.0 10.5 Tf -<4e4f5445> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -BT -115.3515 577.495 Td -/F1.0 10.5 Tf -<496e207468652063757272656e742070726f706f73616c206f6e6c792074302063616e2062652064656372656d656e7465642c206675747572652076657273696f6e73206d617920616c6c6f77206d6f726520726567697374657273> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -BT -48.24 547.395 Td -/F2.0 10.5 Tf -<4465736372697074696f6e> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -0.1407 Tw - -BT -63.24 530.295 Td -/F1.0 10.5 Tf -<5468697320696e737472756374696f6e2064656372656d656e747320> Tj -ET - - -0.0 Tw -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -0.1407 Tw - -BT -188.9061 530.295 Td -/F4.0 10.5 Tf -<7430> Tj -ET - - -0.0 Tw -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -0.1407 Tw - -BT -197.9466 530.295 Td -/F1.0 10.5 Tf -<2c20616e6420696e6372656d656e74732074686520504320627920746865207369676e20657874656e64656420696d6d65646961746520696620> Tj -ET - - -0.0 Tw -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -0.1407 Tw - -BT -462.4142 530.295 Td -/F4.0 10.5 Tf -<7430> Tj -ET - - -0.0 Tw -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -0.1407 Tw - -BT -471.4547 530.295 Td -/F1.0 10.5 Tf -<206973207a65726f20> Tj -ET - - -0.0 Tw -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -0.1407 Tw - -BT -506.9678 530.295 Td -/F2.0 10.5 Tf -<6166746572> Tj -ET - - -0.0 Tw -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -0.1407 Tw - -BT -529.5218 530.295 Td -/F1.0 10.5 Tf -<20746865> Tj -ET - - -0.0 Tw -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -BT -63.24 516.195 Td -/F1.0 10.5 Tf -<64656372656d656e742e> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -BT -48.24 490.095 Td -/F2.0 10.5 Tf -<50726572657175697369746573> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -BT -63.24 472.995 Td -/F1.0 10.5 Tf -<43206f72205a6361> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -BT -48.24 446.895 Td -/F2.0 10.5 Tf -<33322d626974206571756976616c656e74> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -BT -63.24 429.795 Td -/F1.0 10.5 Tf -<4e6f6e65> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -BT -48.24 403.695 Td -/F2.0 10.5 Tf -<4f7065726174696f6e> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -q -0.9569 0.9569 0.9843 scn -51.24 397.845 m -544.04 397.845 l -545.6969 397.845 547.04 396.5019 547.04 394.845 c -547.04 299.095 l -547.04 297.4381 545.6969 296.095 544.04 296.095 c -51.24 296.095 l -49.5831 296.095 48.24 297.4381 48.24 299.095 c -48.24 394.845 l -48.24 396.5019 49.5831 397.845 51.24 397.845 c -h -f -0.8 0.8 0.8 SCN -0.2 w -51.24 397.845 m -544.04 397.845 l -545.6969 397.845 547.04 396.5019 547.04 394.845 c -547.04 299.095 l -547.04 297.4381 545.6969 296.095 544.04 296.095 c -51.24 296.095 l -49.5831 296.095 48.24 297.4381 48.24 299.095 c -48.24 394.845 l -48.24 396.5019 49.5831 397.845 51.24 397.845 c -h -S -Q -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -BT -59.24 374.745 Td -/F6.0 11 Tf -<2f2f54686973206973206e6f74205341494c2c20697427732070736575646f2d636f64652e20546865205341494c206861736e2774206265656e207772697474656e207965742e> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -BT -59.24 342.845 Td -/F6.0 11 Tf -<7430203d20353b> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -BT -59.24 326.895 Td -/F6.0 11 Tf -<5828743029203d205828743029202d313b> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -BT -59.24 310.945 Td -/F6.0 11 Tf -<6966202858287430293d3d30292050432b3d7365787428696d6d293b20656c73652050432b3d323b> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -BT -48.24 272.845 Td -/F2.0 10.5 Tf -<496e636c7564656420696e> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -1.0 1.0 1.0 scn -48.24 244.795 249.4 22.2 re -f -0.0 0.0 0.0 scn -1.0 1.0 1.0 scn -297.64 244.795 124.7 22.2 re -f -0.0 0.0 0.0 scn -1.0 1.0 1.0 scn -422.34 244.795 124.7 22.2 re -f -0.0 0.0 0.0 scn -1.0 1.0 1.0 scn -48.24 222.595 249.4 22.2 re -f -0.0 0.0 0.0 scn -1.0 1.0 1.0 scn -297.64 222.595 124.7 22.2 re -f -0.0 0.0 0.0 scn -1.0 1.0 1.0 scn -422.34 222.595 124.7 22.2 re -f -0.0 0.0 0.0 scn -0.2 w -0.8667 0.8667 0.8667 SCN -48.24 266.995 m -297.64 266.995 l -S -[] 0 d -1.25 w -0.8667 0.8667 0.8667 SCN -48.24 244.795 m -297.64 244.795 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -48.24 267.095 m -48.24 244.17 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -297.64 267.095 m -297.64 244.17 l -S -[] 0 d -1 w -0.0 0.0 0.0 SCN -0.2196 0.2196 0.2196 scn - -BT -51.24 252.745 Td -/F2.0 10.5 Tf -<457874656e73696f6e> Tj -ET - -0.0 0.0 0.0 scn -0.2 w -0.8667 0.8667 0.8667 SCN -297.64 266.995 m -422.34 266.995 l -S -[] 0 d -1.25 w -0.8667 0.8667 0.8667 SCN -297.64 244.795 m -422.34 244.795 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -297.64 267.095 m -297.64 244.17 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -422.34 267.095 m -422.34 244.17 l -S -[] 0 d -1 w -0.0 0.0 0.0 SCN -0.2196 0.2196 0.2196 scn - -BT -300.64 252.745 Td -/F2.0 10.5 Tf -<4d696e696d756d2076657273696f6e> Tj -ET - -0.0 0.0 0.0 scn -0.2 w -0.8667 0.8667 0.8667 SCN -422.34 266.995 m -547.04 266.995 l -S -[] 0 d -1.25 w -0.8667 0.8667 0.8667 SCN -422.34 244.795 m -547.04 244.795 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -422.34 267.095 m -422.34 244.17 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -547.04 267.095 m -547.04 244.17 l -S -[] 0 d -1 w -0.0 0.0 0.0 SCN -0.2196 0.2196 0.2196 scn - -BT -425.34 252.745 Td -/F2.0 10.5 Tf -<4c6966656379636c65207374617465> Tj -ET - -0.0 0.0 0.0 scn -1.25 w -0.8667 0.8667 0.8667 SCN -48.24 244.795 m -297.64 244.795 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -48.24 222.595 m -297.64 222.595 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -48.24 245.42 m -48.24 222.495 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -297.64 245.42 m -297.64 222.495 l -S -[] 0 d -1 w -0.0 0.0 0.0 SCN -0.2196 0.2196 0.2196 scn - -BT -51.24 230.545 Td -/F1.0 10.5 Tf -<5a636d642028> Tj -ET - -0.2588 0.5451 0.7922 scn -0.2588 0.5451 0.7922 SCN - -BT -83.664 230.545 Td -/F1.0 10.5 Tf -<5a636d642076302e31> Tj -ET - -0.0 0.0 0.0 SCN -0.2196 0.2196 0.2196 scn - -BT -130.263 230.545 Td -/F1.0 10.5 Tf -<29> Tj -ET - -0.0 0.0 0.0 scn -1.25 w -0.8667 0.8667 0.8667 SCN -297.64 244.795 m -422.34 244.795 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -297.64 222.595 m -422.34 222.595 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -297.64 245.42 m -297.64 222.495 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -422.34 245.42 m -422.34 222.495 l -S -[] 0 d -1 w -0.0 0.0 0.0 SCN -0.2196 0.2196 0.2196 scn - -BT -300.64 230.545 Td -/F1.0 10.5 Tf -<302e31> Tj -ET - -0.0 0.0 0.0 scn -1.25 w -0.8667 0.8667 0.8667 SCN -422.34 244.795 m -547.04 244.795 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -422.34 222.595 m -547.04 222.595 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -422.34 245.42 m -422.34 222.495 l -S -[] 0 d -0.2 w -0.8667 0.8667 0.8667 SCN -547.04 245.42 m -547.04 222.495 l -S -[] 0 d -1 w -0.0 0.0 0.0 SCN -0.2196 0.2196 0.2196 scn - -BT -425.34 230.545 Td -/F1.0 10.5 Tf -<446576656c6f706d656e74> Tj -ET - -0.0 0.0 0.0 scn -q -0.0 0.0 0.0 scn -0.0 0.0 0.0 SCN -1 w -0 J -0 j -[] 0 d -/Stamp2 Do -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -BT -49.24 825.4592 Td -/F1.0 9 Tf -<636d2e646563626e657a3a205468697320697320696e2074686520> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -BT -150.904 825.4592 Td -/F4.0 9 Tf -<646576656c6f706d656e74> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -BT -198.046 825.4592 Td -/F1.0 9 Tf -<2070686173652c20666f722062656e63686d61726b696e6720616e642070726f746f747970696e67206f6e6c79207c20506167652032> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -Q -q -0.0 0.0 0.0 scn -0.0 0.0 0.0 SCN -1 w -0 J -0 j -[] 0 d -/Stamp4 Do -0.2196 0.2196 0.2196 scn -0.2196 0.2196 0.2196 SCN - -BT -49.24 16.675 Td -/F1.0 9 Tf -<5a636d642076302e31207c20a920524953432d56> Tj -ET - -0.0 0.0 0.0 SCN -0.0 0.0 0.0 scn -Q -Q - -endstream -endobj -18 0 obj -<< /Type /Page -/Parent 3 0 R -/MediaBox [0 0 595.28 841.89] -/CropBox [0 0 595.28 841.89] -/BleedBox [0 0 595.28 841.89] -/TrimBox [0 0 595.28 841.89] -/ArtBox [0 0 595.28 841.89] -/Contents 17 0 R -/Resources << /ProcSet [/PDF /Text /ImageB /ImageC /ImageI] -/Font << /F1.0 12 0 R -/F4.0 20 0 R -/F2.0 14 0 R -/F5.0 21 0 R -/F6.0 22 0 R ->> -/XObject << /Stamp2 30 0 R -/Stamp4 32 0 R ->> ->> -/Annots [23 0 R] ->> -endobj -19 0 obj -[18 0 R /XYZ 0 841.89 null] -endobj -20 0 obj -<< /Type /Font -/BaseFont /e7a97f+CMUSansSerif-Oblique -/Subtype /TrueType -/FontDescriptor 46 0 R -/FirstChar 32 -/LastChar 255 -/Widths 48 0 R -/ToUnicode 47 0 R ->> -endobj -21 0 obj -<< /Type /Font -/Subtype /Type1 -/BaseFont /Helvetica -/Encoding /WinAnsiEncoding ->> -endobj -22 0 obj -<< /Type /Font -/BaseFont /8a6373+CMUTypewriter-Light -/Subtype /TrueType -/FontDescriptor 50 0 R -/FirstChar 32 -/LastChar 255 -/Widths 52 0 R -/ToUnicode 51 0 R ->> -endobj -23 0 obj -<< /Border [0 0 0] -/Dest (Zcmd) -/Subtype /Link -/Rect [83.664 228.445 130.263 238.945] -/Type /Annot ->> -endobj -24 0 obj -<< /Type /Outlines -/Count 3 -/First 25 0 R -/Last 27 0 R ->> -endobj -25 0 obj -<< /Title -/Parent 24 0 R -/Count 0 -/Next 26 0 R -/Dest [7 0 R /XYZ 0 841.89 null] ->> -endobj -26 0 obj -<< /Title -/Parent 24 0 R -/Count 0 -/Next 27 0 R -/Prev 25 0 R -/Dest [7 0 R /XYZ 0 841.89 null] ->> -endobj -27 0 obj -<< /Title -/Parent 24 0 R -/Count 0 -/Prev 26 0 R -/Dest [18 0 R /XYZ 0 841.89 null] ->> -endobj -28 0 obj -<< /Nums [0 << /P (1) ->> 1 << /P (2) ->>] ->> -endobj -29 0 obj -<< /Type /XObject -/Subtype /Form -/BBox [0 0 595.28 841.89] -/Length 166 ->> -stream -q -/DeviceRGB cs -0.0 0.0 0.0 scn -/DeviceRGB CS -0.0 0.0 0.0 SCN -1 w -0 J -0 j -[] 0 d -q -0.35 w -/DeviceRGB CS -0.8667 0.8667 0.8667 SCN -48.24 810.69 m -547.04 810.69 l -S -Q -Q - -endstream -endobj -30 0 obj -<< /Type /XObject -/Subtype /Form -/BBox [0 0 595.28 841.89] -/Length 166 ->> -stream -q -/DeviceRGB cs -0.0 0.0 0.0 scn -/DeviceRGB CS -0.0 0.0 0.0 SCN -1 w -0 J -0 j -[] 0 d -q -0.35 w -/DeviceRGB CS -0.8667 0.8667 0.8667 SCN -48.24 810.69 m -547.04 810.69 l -S -Q -Q - -endstream -endobj -31 0 obj -<< /Type /XObject -/Subtype /Form -/BBox [0 0 595.28 841.89] -/Length 162 ->> -stream -q -/DeviceRGB cs -0.0 0.0 0.0 scn -/DeviceRGB CS -0.0 0.0 0.0 SCN -1 w -0 J -0 j -[] 0 d -q -0.25 w -/DeviceRGB CS -0.8667 0.8667 0.8667 SCN -48.24 30.0 m -547.04 30.0 l -S -Q -Q - -endstream -endobj -32 0 obj -<< /Type /XObject -/Subtype /Form -/BBox [0 0 595.28 841.89] -/Length 162 ->> -stream -q -/DeviceRGB cs -0.0 0.0 0.0 scn -/DeviceRGB CS -0.0 0.0 0.0 SCN -1 w -0 J -0 j -[] 0 d -q -0.25 w -/DeviceRGB CS -0.8667 0.8667 0.8667 SCN -48.24 30.0 m -547.04 30.0 l -S -Q -Q - -endstream -endobj -33 0 obj -<< /Length1 11676 -/Length 7751 -/Filter [/FlateDecode] ->> -stream -xœz t×yæ½3ƒ`ðšÁƒ$A$@Aâ[¤Ä‡(JÔƒ¤©MBc‰dDJ~¤²[²d:Ží6ÍI·©Ó‡MZ{3ÒršØ ã¸[×묷N–i·NâíIrÊÛ‰·n6òZàþ÷HQ›4{ZH3sïûø_÷ÿ¿ÿFYшEÃûFâ‰#]|„…ÖÃSg'çg¹Ñ>„˜ËÐ昺°¨ADˆßïûOΟ:ûW{!ösÐÇpê̽'ìÛØïaeútvrúÒþBß·àj< Ƴì Ìå‚zù鳋÷d樫PÿÞ™¹©É7>ûw 0þ ¨Ÿ9;yÏ<>Îà}Ô•ÙɳÙñ¿ý/0wäB†çæçùJþ§ÕJðþSóç²ó_/šLÀ»?†úŸ ÂZNžyõµGŽÛwþ *eÿ‰´üçãÏ­ßzæÁYªFÄ ýãØËÓ¶_Š¶¤Ðg .2’Ô]d(~di@??€'b–˜ uú¿ê1¬Ã žE¿ñ7|²gu }™U6šÐ«ìeô¡²õ–ùñ^˜ç -†+‹Ð ¨‹ôùj‚k\ã…g5igþù™K¨‹yg2è´c^ò%¸†+Šj˜/¡0ô³1S0çÛP7 ýøEh3 0<Ëp=òÃ|~¨£Ÿ£~¼ŽF๟õ¡aÒï‚tÌa¸.Aßä íß9ÿÙà‰H/û"t$›&ãÑ;¨J\—°Šüo–Ëöߦ:!g࣠Œ™õ µh›QȤâ@När!”½úûbŸ?P,UB¨,\^QAU;ª£5µ±xªÿÿ/ë—ø·tþ·üTT£"ç€Uû/Œ©(ÜîUùèhëm»8¦|OÅΘ·VÅ5ÊÿP-ÑZ•©80Ú ÕªlÍŒWQ;†GCjÇX­ÊÕ¡¡pè¾Ñø¿3æ‡~£7ýïŽùÃ!ÕU{/ŒÑcc0Ÿ¡Æ:1^«ò5Z¾«+×&&ü*‚i„­œ6ul5k²’‰×ª¦å"Yä˜FQÙŠ¾°¢r•ý*]Ê.M*¤Ðä…ÆüK´v@¯‘E:É/…`Fsò&eÇR£ÄU!:1ª(»Ã½“SF•é;õ)H?+Y–V–”ÝK½“á%e)L— “ÉÕè ü‘µ#K*0ÆFWj]ó†B~em ăú€šCÚB´›½&¬¬+£#þŠÇF—€¡¾ðRXYê[ -O’úò¨U%¢Ð-HÁñÿ0°DáÉØÎ ê¬&–®±õO‡—UÝé_«;ø*Þ¸¢âÇЀjÕ0þô˜ÖK¤¥J`®Px`¬¸šEª ô¨lt—**=9aŒ‰ÒŠ *bq¡b„Ši P1“n{ô -V-…ÁVhµ<ÇD5¶÷TmJÏ×åÒÚj±C[Ófu hÜhÝ8{ò%æ‹Œ>¦ OØ„zÑ€k5¢*´ 6VÚ »ý0xŽ$x¸ZTŽFQ3Š¢8 - lÏajú -ÿ^Aï`/nÄ£øwñ¯˜:æ5v;Ì®rî w™ûcî#Ã!Çwñ§ù…á‹F£qÈøçÆMm¢MOŠ«â[æ!ógÍoYvZÎXÞ³ÖX/XW¬Ø>cûÐÞdµÿRrH ðÎ"™÷˜›à±yðädGï¡œh@QÕ’Ô°°®¢¸&ÀØÀªWÑÚ -gE -Uù„ÊI9d“¢ª=±ÂêòšÚèÓ8ËUNåbÕ²ª Ò Nµ¯j¼ñªA5Æ8•_ÕlÒ ƒ*®ª¼´làÎ(¼4ÂdËÎâŒH³i«Y€Š`'Í¢´lmÐ,Úho³Þš­[ÍÐÛJz£À)šÌV›=Vøá ÄIš-Ö͵ÇQ]}ZÉ&ì ™XË"ó㛣ؕ'ÌTæßÍ¿‹Øù>v‰Ì?ß´žÄ.ì¼ù¸ŸÊ¿}¾‡‰W<¼Áã½Ì{ =r+‘’&XÖUÛ™_J'ywK¥8\9üͶ¦âc+mßü÷ýÅÎê§~‘ÿî3eÏÁi˜£æ鎸ʯiv˜ÃIæð´q©†IÉ’ö)™?j÷øZWW? T=ñù±£zlÒx6ÿ—ùžE„¦¡t £U>¾‚ÍHä¢X5ÆU¼ÊÓ˜œKh¬e}Å`E.PžiM3`Ù¡ -X2™Jºí8œ>qï¨öìµ]¼HçÑ·»|öBÄ6ÀDÈ…U.®!]aíÈs(ë©[Äâ·ZZNÜ M,Š¡€(¡ÊËE·—ÉDD~@¹ô‰1'<s4œ¾Î¼ vÚ„¨Eê&Ç°`rLLeW5³pŒ‹YfXÑL5¿U*hÜÓÆÉD‘Û%`>\VÙdÂ;Å"‹CÈ¿lqYÿ øØ¥Ù*¸?úxÐA€ÊŽŸáט/¢صÍ(çjµra=gâ 'Ì7Pšƒ@oöÄ)ªIPN­i;‚²Cóš2D¤ÎTC›†•ƒl vÙX±iWM&Ú˜VÜcÊlŒ{+š«J•@Cs4ÀJ%ÍU¡` ÙRrø*‡§ªº¸8Zƒn”#ÍÇ»;í{mï'†Ý•;'ºÛíf[´÷wöwg3a³;µ0Ò2xºÎᨛÐu0·//ðR9#áÄ»Ûȱ„ÌJÊ*(R\§v,‚EP;Ö "›AFP0g7‰tRNºC©¤Ìß³Ûy1ýb SÝÜuó÷[ZÞ£¯·nß‚õÂèU”³“õ,æõC+Ö—ÃvÆ:/«>}-kB­ µ¯ª–˜j\Õ$ç Õ½ªZÀXÀ-,‹ä®JÒ²,¹¡ê w'½»È=í$ž„y›ìȨðß™Q]™Œ#uSu˜DI6šN—;¶å :DÑòk­ºÍh%aÙ‘3øP&C8×ÇƼPü§Û„TCetgÇ;êŸ\ΖŽ×uES=ÍCXí!çL²Åd„š›ö0¸û³ßqítï?•ŽD[ûOÍõµI±¾K®•žê¢Ú#5 w´›ºoÀ/9*!!I w@Ð òƒ@.2×ÁYÐ3º×9Éd2Ç(…“+2?èyλԹ™qè½U¿åA@×i,„AÈÌ»lœ€“é¤;ÍŠxñé7ðO1Æ,6àü§þ›ç Å6ÑÆdp»—5Ž10sóUÜE÷nÍÆ¿à'ðw ~GÐ(ç"Z-Þ2¿ˆàºÃ†b»5ºQ†@ÿ*º…Vس°Ë`#\‰E3;  §f±BÐ+“s¢½˜„?§¸CZ© „ïË–5aQ P·dT»¬ZA%iS•·‚Þà×h!žnZÍ`&{ÔC·5£–ʪ‘†{VwL²ŽBaFÜ)È!þƒtôI‡¯«T0žÚÝ¢±;ùµ!BKŽÊ³¡œ‰È“ÄyRºü„.E÷þ@qý&b:n0"ÏКf é$‘§ßE$f¶/%•Í¿Í"åF‰Y„O÷ö îëé¿sè`²ºÓ­T3Þx,ðƒ©kiÜy3ÖÐþè±±ß7® £¢—Ð]t?2tAÆØrFs[Ô®¸E aËMâk.‚Í—‚@­@*eTQVíÕM¼s ãq‡‚xSŽédŒÉ–Ý»?ؘ.g󯱇[ùÁÇ?1ð+¾4RjKvýãÔ5gÍhkÉÀ±tWò±'#ãÑ1`j|óÐq»ž]äL LÍF¬HTIê{] ªnÏšÊ$Ç­«r‚*݆A‚¼…¦B)ð•©­œ/ÁIw|³¿g»+ª£•e-ŸÀ¶ô¢œ˜2wËUÉk-ùûñƒ€´ü üüרrÑ~ô]”k!’ -ClÄTÕ„9©íTïH@ª£¥ 亰ÅJa -Vâªiœb´uHY“µz(˜¤E?Žj—r¦0…bh-–r=MPŽBypM«7 ¾¢`ê`F푯Âå±T‹…ÄÁbDz½Èæ¢m¢FŽ´–1h†ø•É¨‡Æ)Pë’Éø=ŽeÞTTL[dÍæ!òñ4¤A:éTADà{<Ô6.Ù-˜¨Voã ê #Švº -Aëï(. ïHÆ™þs°„ÝcŠL쪉›½Ê@¸¾¯^4<8qô“½]¥íã¿ß^RR®ßQ–&×îÎÊòªÚŽ6íž1¥í,gL5vÞ,²»B¹6ì’›‡†2¾ -{´¬³Ux¤.SŸ‹œk¬Àþ󢽺7S-Ém~ÌUðYª'¡²­ c uš•Áˆ8HÒ­¹|›÷ -²n°”jÇ%xx`çÁÊF¿Âfg$ªWõŸv{CÉ°7ÿ6.–]EòxÇ/äŠýàžfþ#äŠIš+6 Œ¿ôŽ½ßzÍèrál Ém«œÉ‘N°˜( q–»Ê©,¤ˆÜ*§‰>ˆ°†UÍDá9^‚ã9µ„˜Op9/ ҚȀ8_âÏoÏ oq9èÜ#{^ÿÒ7|~¸òýí³ùŸ¿}àÓÔßî‡ûçæmy¿-ï°®SR·òŽýÌ›7cºn˜×`ïT“Š¤ ðJƒMɤV ;Ø;8J'ª0 µ|c)(H?©z F—¢ ’Ù@výȪfM+­Öµ”N°¹Wª~S³øåÞýû‹C !ªZÆèuûœVPî65£Mû½¨ç–ýQEâä–õs^9/0·Ýú¼”ÛíÎýëv¤V¤vn;uEJ}¥ï6³#~=~ý¿ƒ_‘íÒãušÔ¯; =E”3`b$ †‡²ò'3ÈÏÄ’ä$±G™­( [Ÿ†Í­üäø±û/« š?t¨)}øÙGŸ<1uùÊÙÑñ;ŽSPD±F+~dC°ÆH‡Ž‚×­ÀKï#ÒpßBî[HÃaÖ‘†—" ¬ËÌ£ËliÈ·\’·cÀG{Ææú[3UMcƒ;Í.ÙnM–ý6ó@}ËÒÑS—3ïã΢ƃ»²¬ ˜Œ"O>5S»Ç½”þÍ|^((ÖHR"»N´•*V‚ cÜ®Ä6&í¶±%x¯¥l¯!Æ/;ÃU¸ËƒÉÿJbèê4èª F¹táœ3çÛ:8ËÄU÷ Q)綀@Pn†&7±tœ!nëy“ì+‹¥‰KO8´€B¤dJë9SäLî’yçs¨š3ùBÒÃ4k"g26™(*$Mvú-L‘¹Ø(•Ñr—Ø#ïh·…zêöô›±›u -Ñiw™c•©®ò$¼ˆ–íÂVÞ"Ùj_qˆgmk{ܳ#Z\Ú…MŒ‘‘ã -o¨D6Iµžš?Ý?Á Ó‚œä¬Wbt§‡xƒâp®¸ê !Þd%_@,H3“|Þ–Q‘ŽŸI˜Oº!²»]ü‘ðpïc5VǤÝ{@|¿ÿþ–_ µT>€­úÞ…µWðKȃvëç¾: âÅpÅÞÍ“~8™~I¢Ð²ö‰ràÌSDD>Ô$eÓ©M´c‡‡pi]C;‡‡wݬv›ñ}¸;¯…ƒérщ½ù7ZLi‚Üh2¡}4IHÑOÒBÈ2ÍzZhÚžšiZh¢L /CògÚ:(À¦­œÏ9Ÿ3Ɇm{F ÆËýÏŽâ—>¸¯$‰GóÝ…µ™?‚µ]€vh6¤Z“9D–7˜Éòn²“cM²~Þœt¹7WÛâaI3{jw_˜5|¬oéÈ.E`NïÃ/å_?Ò†÷~pq=ÏwÿÏ…Ædþ˨ðçøuX߆͛™‰ÅFNQ9â0ØBn ÛNÒ… P!©lâ Õ(i‚x”u"%9‘`%ôSRd$-cH¢x±Ì°äÔ”~Ñ3;£ô%O^‚óY6ÙŒ¤¼ÉKÒÝJºÃÔFòÞJÞç`®m9,ŸÉÁ0rnk39XŠœ€\ÇŒbñæÇ=Üa¿uPKÎvl±Û…]ŸNE’Hp·ÂôË/Ž,á+#_}uä³Oü˜ùþ3Ï|ÿäÅuÛÙHRÛ)&'QºÙ€ÈMh2A¢[¹:*¦‚+^Õp˜Î&Žà(’n¨®UfÅä**f -i8ÂÆ[Õ­ÃÙ¢ï?,k»n‰úÉnë'ñéƆˆm_G‡™gø{ðE;CÊå÷Ž3à—~ï‹ÑH/î¹¹Þ]gì”#»dÑd7ñÿ“À‹eóä~T AƒˆÇÝv‚Ž™«`÷œŠWÉé¹A5®²`÷9{ÄŒ`ÜÜ -äà@?5 Ç~,ˆ³ùÿý7îØ<†wVå¿ŒÛÉÚ6ôó.Pѯ‘h­ðýñÖgHòNnH@°)»PUõûlIÉGKˆ ³(]øk%; —vô0ʵúS@ à¦NDÅ\4'¼„k|ÅE›°ÚAꊤ/B2âôíëjBÊ)^)ºbÒ_u HÈŽ¬.®$\ÑÜ -i©VPHsFË/`$yMJeF?IS+@;ùRBN -éy;KQ|Ö³qä)œž [ó¸p0Ÿ~ä§Wî|áÚ¥Ÿ\¹ü“;/½yïï|ïâàÅ/}#ÿþ·¾ù™žë4: ¦z¬w=öè|E“Gá£Íé'^žÁeï_~䕳Oþ¯«×ðƒÿèâ¥^É>þ§oÜõ”w„±@bϳÌ}û†>aµ8{<ÒmaMÐû6¬ÉþV¬ þ¤Æõ¡¯ ?ÃF!ß|É—Z-<~ãyÂEßøWÆÆèß38Ñ϶þ`)ºé¨OŒÊ @T(³€JÛ eúÌÊ(ß](ó0ç•­¿˜*CŸ+”­è ôµBÙ†|ø˜ s&¨ÆV(cÀ7 -eYw¡Ì¢ƒL¨PæP€9_( üd¡lA»˜ç -e+~Šy¿P¶¡îñî¹Ù Ùs‹ÙiåÎ{•ÎÙésÙ{•Ã1exrvî‚ròÜÜYå`vT997»¸Ð½÷229» ŒdÏ͜ܛž9¶^ôÎ;•U±:¥Y¹½4Ô7Õ6Ö&êê2·¿9œ=·037«ÔÅÒ0îκt]*….¤ípzqq¾9Ÿ:[{~vfjn:[˜;n*{’,›Í.ÆžžYPÈúÊÈÜÉÅ»'Ïeh833•]vÎÏNgÏ)‹§³ÊHß ²o>;«wÔ;Ô(›$ÔÇêc"¬0–L357?“,ÎÊÂç”»gOÃdð†ˆB™Ÿœºk˜ž™UöõÆïY¬'g§ÉÈÉ3 sÊä…É™3“wžÉê'•ÞÎýÊäb³R`kaêÜÌüâBlaæL ŠÃ$ÿúÔ éÒ,º€²è8ò,š†­p'ºîÐ> ­YZ; °TdZç ¿‚N»9tJ¡Ï(m!s-¢˜u/:-#´ÿ-‘f Ï^ºÊ :c{ #Èó:o”€•êàÙ ×o›GïQ€¶5•€Qu°i~ۘô´e²®ýc(]Xï4бˆæaÖ8ü›êjÆYè;½§ad FÎAÛ9hÉRnušcÐ+ cã ‰ÓП¬¹ÉY}ú.ÂV¤ÒT -=ÎЙ³”F]îç©Ä … -ô?MûŽ€³„ç> ,Ki¾5óàm3ÔP-ÝÎ_=ÐF.qe·¯»I áqÊ:%‹P;Ey:] çnxGj:eú˜“[”ÌoSè.¸Ÿ*Ì9KiîcÐã¸j€ŠIÊáæš“@ ‘()]€kê“`}gè·Vœ¤\w‚»&åEª÷Ûµµ« ÏSë‹Q œ§®¡x’Ïê‹éoã¡åo;SÔG›éߊIôï•œ€jÝ€ŒJP9ª„±˜¯GIèÙæ–A­-÷Àô»ÑPo?€…†`Áa`ñ¨ü(òl©14¹ì:ŠŽ£GÑ÷Ñ·ÿ/ë©iˆ -endstream -endobj -34 0 obj -<< /Type /FontDescriptor -/FontName /b0705e+CMUSansSerif -/FontFile2 33 0 R -/FontBBox [-1135 -354 1481 1191] -/Flags 4 -/StemV 0 -/ItalicAngle 0 -/Ascent 800 -/Descent -200 -/CapHeight 800 -/XHeight 0 ->> -endobj -35 0 obj -<< /Length 1278 -/Filter [/FlateDecode] ->> -stream -xœe×ËnÛF†á½®BËtHs&Ã@‘n¼èu{stÔ’ + ß}ù½¤ikÀÆ/‰œy¾_Ã!}øôôÓÓùtß~»]ês¿ïÇéÜnýíòåVû¾ô—Óygì¾êýë+þÖ×|ݶ“Ÿßßîýõé<.û‡‡Ýá÷í÷ûí}ÿáÇv)ý‡Ýá×[ë·ÓùeÿáÏOÏÛëç/×ë_ýµŸïûãîñqßúØú9_ɯ}à´Omûütÿ¸óϼ_ûÞòÚLL½´þv͵ßòù¥ïŽÇLJ1wýÜþó‘9ç)eÔÏù6=n?[i(JKiU:J§ÒSz•2¨Œ”Qe¢L*ÊEåJ¹ªÌ”Ye¡,*+eUÙ(›ÊNÙUÊ-уÁkä5x¼¯‘×à5ò¼F^ƒ×Èkðy ^#¯Ákä5x¼¯‘×à5ò¼F^ƒ×Èkðy ^#¯Åkåµx­¼¯•×âµòZ¼V^‹×ÊkñZy-^+¯Åkåµx­¼¯•×âµòZ¼V^‹×ÊkñZy-^+¯Ãëäux¼¯“×áuò:¼N^‡×Éëð:y^'¯Ãëäux¼¯“×áuò:¼N^‡×Éëð:y^'¯Çëåõx½¼¯—×ãõòz¼^^×Ëëñzy=^/¯Çëåõx½¼¯—×ãõòz¼^^×Ëëñzy=^/oÀä xƒ¼o7à ò¼AÞ€7ÈðyÞ oÀä xƒ¼o7à ò¼AÞ€7ÈðyÞ oÄåx£¼o”7âÝþj·ùº«üo—‰$‰JI•$’$*I$IT’H’¨$‘$QI"I¢’D’D%‰$‰JI•$’$*I$IT’D’¤$‰$III’’$’$u>áMò&¼IÞ„7É›ð&yÞ$o›äMx“¼ o’7áMò&¼IÞ„7É›ð&y¼U†o•aÁ[5ñ‚·j¶oS orÁÛ4Û‚·sÞ®>,x»Â/x;³áíê·31ÞÎÄx;ãí -¿âíJ¼âíJ¼â특âíâ¬x»¯x»¯x‡+Þ!Êw¹âB®x‡b®x‡+ÞïrÅ;àà2äÍkçÎœñJ¼Y³e¼Y†Œ7+|œÆx‹ oSŠŒ7+[Æ›ețךyÞ¢oVûòæµ– -ã­Œ‹·é€"¯åþR¦W*–Ó4Xq”êC™^Jú[(£à®^ƒ1»¢y]•¡Èk¹}¼YM-x e#e®ÊÙ_y+ë¡hÜŠ7k¶:û«/ âåæZå5«dUýµl¼uz5n¥¿[ç¾ÛeBø÷&Si|y»(ô%q& %+S%ñ*Aª¾ÛÊBijPUÃ6\¥h,ž(¶+L¥ºÝ,¥‚4G©5Þ<¥ô…ÂcD› [˧©ñ¡*t› [ jj¼aijü¼IµÙxʪw¹g¶FÉ›wžØ¶¥§Rô>¶Þís¡hâ>½:­O¯VRÇËí¾³P*#àÙi|e0¼Uýí,ì*o§¿Uáûì/ãÎþ*[§¿Uê,”¦Nö¹°•m€lšx°:š&\}M³¹»)üÒðL2„´\%CHËF7æÕ§‰ÇÊ»BŽLÉ…’)æjæ€Æ»J1:%¦H߯:=Së©ÿÛ³zýr»méükÀó¹žÌOçþí¿‡ë媳ôû7±Ùð -endstream -endobj -36 0 obj -[333 760 760 760 760 760 760 760 389 389 760 760 278 333 278 500 500 500 500 760 760 760 500 760 760 760 278 760 760 760 760 760 760 667 760 639 722 760 569 760 760 278 760 760 760 760 708 760 639 760 646 555 680 760 667 760 760 760 611 760 760 760 760 760 760 480 517 444 517 444 305 500 517 239 760 489 239 794 517 500 517 760 342 383 361 517 461 683 461 461 435 760 278 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 1111 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760] -endobj -37 0 obj -<< /Length1 12740 -/Length 8658 -/Filter [/FlateDecode] ->> -stream -xœ½{ip×yà{}Ó=ÝÓsŸ=3ÀÌ`ŽžÁÌàÁ â%‘E -0)‹ I $DS¶åCÊŠ¥8ëŠílâ-ÇåU¼þᬻ”'Y/âÍf]Î:v9YT|eíÄ®]x-Å»²bs«î÷Þ Êñ&Uù±v÷û^¿~ï»×M„Bz?bÑáCGëÍ™SÏüo„ð ô?synño÷ !æ$ôÎ\_²å/É'áþ©s‹ç/_äj"ÄÌðç/=uN¼üò‰%„.ÌÏý»k'wÁØïÀ1z:Äï3€¹‚\¸¼tãäÛ¥›»æÒ™¹WžûâÇaþ¯ÆUáþ8Àö•¹Ëó¿õî KU´¸pm‰ÛËË Œn,^_|ÏXå»ñ7þ{DhCË­W^yñ û'Þ@öž?}|9¸yUî ìuæ.Œ•ƒz?xŽýĺ‹¾Ì~¡{Á^ïÍt߯B{*è6’á O8He¾¼äÆÎüâ/·àŠÐK½+þÆ°ƒ$ýÒßás{Î"þžbí{ã€Ã'ða{ó.ós¤à˜ç3è“xØÇÓ?ÒpÔØ8:Á<ƒNà=pÝN%'HÌ€>rœAÃÌwQŽ£ÁsÃh¯¡¸à{(£ ÌE÷Ð8ìi´îm¥kÀX:n žùÊࣹ‡Šbý#M9Ù㜚Â/§÷þß›à Ç“– JÀâûŠÚ›OÓý†‰V…Þ¼ŽDcñD¥Jgìl.?0X(–ÐPùŸ^ûÿÏÏEUYÜøáwÿõYå·G]¡2³m–ö==kÿ¥‹-'ZsqÕþ–ë«Ô\¦zàá™=ùÙlÍe«£¶;ux&ëNÍÖ\®JÍæ³ïšùn⫳ 7s7ñêl"ŸuùÊŒ;}}–Þ˜…ùøªvòm5W¨z9ü<¬n?òdÂE0Xõh×Ôf—T ˜v§^såªý4YäO`Ûe÷åm—+ìwÑá™[ó·ælÒOd³³‰[z¸‘•vFÂÈÂŒjÕþ%ÇWµë®X99cÛä§çÞaÏØgO÷¦ ã4²2,mß²¸5=—¿eßÊÓåòdrw -F}¤Ãš'<£Ó•¶­F³Ù„½z ØílŽõqËÒaþjÞ^í/ž·gMd]<;s Ú—¿•·oí»•Ÿ#ô!—šk1o“@_ à¹äçÞqê~JÈ£Vˆ¸u“°mÿÙü-ѵÏL$Vfk ¿ïý ÿ:àJ‡g<Œ?8ëMn¹(Bðah¼6TœA.oïqÙÊnW²÷tùI¦B)ÔÄ}@&Öú€€¼¿¨(õ>à@µû€€ÏêØÕû Ы7˜ŠÇN_Р1´ šöž?@úûî¡Ížé1™Ð›=<8³΂7Ývïí`ç_a<´9…BGÐqô8º„žG7лáüú EGÑô^ôZDKèzý*D©«è4ºŽN èô´íBÓh:‡fÐcè,øã&ÌùIø[Å“ø]ø“L€y™ù9{‹ýçp§¸òùO ŒpZø+ñ£âªôÛò€ü¯åŸ(§”ßWã>ÅwÊ÷A­¤ýKí'ú9ýþœÿ“þuã”ñ×æ@<¹‚æ5ˆ,„<½†º -‡*®¯åaiÍEuO„‹ÔÄ®QwÑêmÎT®â -M—3ºH7*®¿y›íuš«îhÜã|79—sx×·â‰Á;œë_ñé&ïJç -+ž¼Ã»ÊŠ+˼ Y¸)ÁdË<ç³*<é–7»ED?éVŒeUÑ¡[Ñéhµ7ºµÍn­‘Ñèó/È*xS§ÿßçEIVTŸ¶ÑãNÅ1j ™YsPÆVVfYl*ÌîÎààúóLaýÕõW±…­ÿ…ƒ -óú]íbëîgá|~ýÇ0æ?®ÿ˜)÷~잀/1¯¡$ -@ÐO.yþÀš›^%óc£c£[q3ÔÙ`š +8W(:øØ—8lhéPª\N‘#ÖýɇOÞÁ£Ÿá°®ñ»ÏþÞ_~æÌ®ѧÄú ¬3 ë<ëØè2¬“­»ÂªÕ×Üåº*ßtUÇ•W¼ zÇ ¯¸ª±¬¨²UqƒÆr(¶*]8ãÍ ºètá&\Ð+ª¬Cá>›ÞõYiO²#m‡)(!á¡Bu,æs“õX=Õ1ëÏøôd¬±½øØ‘´Ï½*ä†'æÿÍ•qÛ”9Ùþ+Ÿ>ýÑWLŠ˜ðíø½ßDßDϢ踨ßæ4$sìÆ Ý`Ó“õ5O6ÖÜĪ'#3à†;„¡#mŠE -XàæH{,Ÿ+õ]Òª<'Ib0hûT5|0¨ÉÚq-:-ñ"ÖøG%DÖõ¡?Æ L´}¹JÝcø5r¸Œ]­î!\¹Í[H}Ö©Û=éa"8_íϪ\8ð_â ˆº? -ó90é«ÌWÁêÛ¨gPÅ3µµ®Fä ûÖ°»£î:@hD¸éî\EÞ°i–‘<0:é4†-`o«†8?nµš@Y>›~a£c“ÒH»ÏéLžt9œ£¦«5.>kZíì±ðù¯Gjµõ— „±,<ÃjY-¦èü–ÑBË2NFÐaƹ¤3”~n.92ØI =ÄÔ5) )†¦Š/òE}ÐÊŲ¶9ú`i¸$J:#D…'ürî­áŸ}{ÑyÔÝNè“}kÝ&¡¯ ®ÝÎXÛ›ZÅËÈ@꾺ë_õ&¬5¢’éÍûÁò <­ãŽ›î–òäí¢Žk™?Ñ鸙t»s9_qÆ)KÆc£Ûq3[ÍIvnëá:ÉMâMvFÚ“0‚h£ìò¹¢#8@./Ô$KA•iL•ý -Ãó>%( g*¹-é('•+óÛÚ‡Êqñ‰[ϾãÒ1Fæ†Åû¶ª<.¢¥xPœÞ&¼À3 ƘÛß.—â™øVI®7cÍRøüTñù'/mgD–çx_È ùÊ¢AðàT ºpèv:—4êMÓ¾57n´Pwó«n²éùüT#Š RX2СÚ=fõ† ´ážâQ¯± ƒ -à±–)¤}«’ê3Wkša2ÑPÕÉVþF3­ ®þ'v0ŒoéŠà¬/²áo&.W«—Ùà^<3xÖîýˆÊr¢Ì¨›#Ò¬ƒ4y"M ¤Û‘ãAš1¤¹›º²2H3ÒtËàí;FÅKL÷¬z2è°ÅÇr 07ez¦ÒÑ¥@¨1Ó…VÀÓýªä@hy„Ø­Ÿq@|¢‚ž‘œ²søH0Í€À…ö6ìðùœPÙÿžìVð˜gÃÏØá‡Î¶‹ÑÂø¥ÆFΔñK‚YÊø£¦&ñDP¼b¦¬v >ó±‘ý…÷9ªÀbì“?Mìœßbh"]î<\ho±y"ä€:ÄeâÁŠS'É,‘1ßYZàÆQ×$< -ªk]–ð(¯c)c‚À‰ p%\Q ]Aæƒ H…Ò$nƒè&õQ°ëI æ Ž”-æÅåŸl}ìÌÎrû'ö:? G_\ž:Í1|¤zlâC¸ÆÌ6ôêgg{nwQ ì ùöyµ1q¤1ȤˆM2Ë m#})&´¾}Úšë3n›q*H“x Å7‚4¨¾QôF[YÐ/âl@«Š!˜8žÐ`¸ï€ð‰H0âÊZùÃç°ìÓM¹¶þF­F ´>^='ž(:,# -¯àV8ê0³õÚÝk‰¼Ì±ØÚXƒ ò†Ål’ßå`)ìF7ÖqcTSËjÇdê¸Gh0Þ®ž>É€o­R7ýuf&©ÜýË8ªn|¸·TÕŸCôeÔµÜp«G’HÚB—jB¸j‚N$A',ho¥±¹^¾éÖ· yPâ„èeQ€À¼,‘³[7–+õ2€Uz®Ñ³CÎ]öfÀv¥N’Vµã:·ÖASŠ ;uÀߨÑÅðÚ„v†ò:ì3 -Ùg²Ï0dŸÒ -³,J‘(åŒ'FîKýÚ4Á&Ä‘í47ÃôzÂ9i§u'š­:'쌾þš-`@4q÷u&®·v÷åZ,Îhw_sÀoLÜûø¯"ÐnÔˆ6ÊZ—!Vi“ðXÜTG t$xsÐ.­z ü¦ÀjÐäušÄc8¬`šo5Çh($žQÔ™Ð>„¬{ëŸý –ôTi{éÐp$©²= x¤™JsøQ|}ýÖúoà…ØÀT},ŽLj—Ähn[¹e‡É·oÙðw ¼®¢ýØE=IâA8'n$LЯQ¶§{‰L8?h:àù*iä)[);Ó ñí:ß^¶H4qÀŠ'€«_/×öGÆ#öhuG>ù¾ZK3Ÿ+uR~GêÁØaŽùM‹Q’á±Ó•=ÕŒ_Âípa¼ÃŠV‘¡‘ˆ4¸ÿK€¿ Qmu3D[Ìž¶x% !’Ë°@B„„´2•@ HÈ6Ý  ª<‘ÐM7⸡Ï¿ãj+,êú´Qœe¸FúV„¼‰×Ò†… ¡ìp„ú­"SÌ›A6жRì¸-,ÆS°+¿Ù!ɶ§eãF-TÇ®C}ÄÒ$»ÈF@Ñ -cÁ0¸ mXgÓÎŽÙÙd¥8•IÇò0C€çö»úX­¶vÎì:ôl!“›ÊT'ìŠ[lKUš¡x³|Ñ0y^€ -Ç>~dýN-j´NŽ§œÊðC¬h³€Ø÷£2߀Z¾æP7JxBR>oxB̤«Æ4‰¡Üˆƒ›nR†èŽB/©óƒ @&¤Ô„ Ñ^GÓõr•M¯Kò¸¦õEš©ŒEÒ"2m’gC¤œ°Õêöƒ¿> î˜.ooJ¦å8/°+ -$´’] ù0£KÉzÙa®‡£Õ;¥›åçO5CC.IœÈrL$9’Iñl:ßL„òÄ €|t` £n‰—!Ò5y3Î -¡±Q‹“‚_Êؽê¶MË"b!¯sà!Ò,`Íö8Òsv?}ô€Ä[âp$ÞJ-3jh(50a' ºeøà¾Ù×~ð³!^}¨$É8šn¤"VÌn¤ÂV$õªÞ–tµ@äBð|äE ’I‘Z†º./r!j—%8—‰\†7c# Œ9Rõ&•KsÕËÅ@G­K53c‚L¬ŽÇB2êò„"’NGF¨)âžöI9ÚG(õË9(*`~evÏxµŽð5gPžØõ›ƒê Œ -üÏŽaIÒý!5S «Ìß½çoTý¦v÷Ç®üZy8™NBiò†(H<N§chÃÿÈ!>r®!MÒ(U90ôš Á?MÌp€¨B@L7ÝA‹ªQ™UÚôD3â6 C¤ -Ó’­EÂM„ -Š8\yúý‹ÇjVuýÇÜ»«ûªSžzúÏ|-ž^{×Á{«••Êc†Ïûýš\}ϧ¶ì z“ÓÓÌw‘ƒ¦Q·H}8`êFZ^‡ ¶ÙuO ©×I^Âö<ä\ À9ÀÓÁÓU;=ÿ7Ò†ê2Cp†î …{ûI`ý4¢©™ÏejÊ3ûC,ãッ”ÊŽ…C|8Ðd¤ Y…_j`æ°Ÿã·Å¬€áé]a„²¾¬Ah²ü>|±(z¯ÉñE;Ðaôvô·¨û áz òLÒûà×¼´Ýlºº·$Q|äA$Q$„=^wåUïHÙ«æí~dq·ÒtwÌ\Ý=btåGÁ7 kËû•*·GzO­z;90)F‡À¤¼GPs:E’Pì3»!+Šê ‚Ò¦íUÚÝ$õ˜è¸šÞd×n-à2·hº\Ç}$ÙP…<}ÂìÊùAâŠuPødŽ–I‘öX˜i…9Jä-«%¢ë?¯BØ ø¨¦´Òaûë„2u#¾0ÄjcðØ~¢ˆôèÄ‹²“¶^ãÌø‚å‰rÈÇ0¼)Ä3• Š¿·÷Üh‚f¡}¤õß¿Ç L¨gcõB9šð…#©4êåeÛ䉜îí‘°Hã#ˆ¤+:†dùõ Ñ„ýDÓ 5…>5 –IÌD²MïÉ41Ëz‰ÙIR–ÜGÙ3°Hc—&f!ª²V$͵šcÄõœOÔÊi÷¡ò43ÛÖŠDê ¼ èzPÝšŒŸSLCñs’øfØÉ”Ÿ«îÚ IÙàï%ƒ¸)ÊóÕ¡£¾0<'3¼€z¹Î6ü:Кƒè -¹NŠÐ:¸A«",­Ó¨ -dæ®<Ð8DhŒC6nþ5@3Aí<6EHL‰ $'$j› ‚i÷vŒhÚ¹ oÏŽiÆ@“òë'}!ËïW|¼ŒËÍ2Ök[v&‡;C‰æcUUmd~øÌ —9 ׫wÿ”JWjr¦ù¶­ŸŠëúÎÔPÏÇá³@Wèêyc?¡ˆky!ßæ¦R°Wšn 2èÍ=h’&“¬’†-â‰Ék;ø'Ð×g´ ˆ…² MdptwÑ5‘†Œ\*&ŸÏ¨$üΔ¡IF![’߸*B¶ãkfµØgCš"÷_ÈÁ^i’݇p] ±ˆfe‰…´ê®¹z;ß ÝÀrS1*·+=° ‚ v"p só¶¤e ‰Ä•€Ë1HNoÓ k.sfŠFi+àÆÞÜ4ÀdÓÀÁtÏ`¬¦û4þB² -õ$Ÿ¸Ž «òƒ{ë5NÅŸŽOmñË -hYv²²üñ°•ËŸzäÂŽir;#ƒ…‘dÜ/øwŽ¾Äᨠ)BÇŸH4âQÓ''· >²«A£пÄ<õàTO>›µ€«´Hjí HTë ’Ëu'jP”è[ä•¡>pS›Ù-m]ýw?U¬¢@Ó×\ò*(T6j'až™á`89Áˆ*Ë<ÝÙ–CUHåŽ×jÇ ]×ý_æ ~¤˜OêšUܛ޶Öo€÷OñW福%zU [–R?}æã­ÅÔ‹¼ßPÀZ,¨ -òMªm@€¥€­"$# $ä;›1¸nn NÞÜaúN+B=÷ÙqØ­U'f0—K;NT#Õ*®Õ‡r»êv;À¾ˆ­j@Ó×ÿž3¸jjëÞïûý‚(—×_¬T³a+^Ë  -À4Øè7P7¶‘O{œFPÏÖûïÆ3é›nÆqÓ+ž¿ã*+ž¢ÞôDE’8W]ñBÐ\áÉ÷’¨X•e™œ»Ð¾o Cî )YVÔ`(¥û¶*~IçææeIàQš÷DÈîÝRgŠl!65#:Ëž:­7~ƒa„kr"ñÕy~QÂÿ.©DDuý[¸ZSEI¬<‘B>?~[mÐþM }ýVÿ}D¹ÕÕˆÚ¥€ø±º;N‰Ç‰›œ‹7±â5í;nnÅÅÆ2ƒÉ¾KÏ1rv›Æò@3Tíû¨f;`i]¸C¾ø<K䚘yó³Š_ìé½dÐF@=oI¾ƒJ·WJ‘Ï!DA¤§tÇ´Pìy£˜>œ ƒ£¹¥¦†EI‘$â@R;¬˜Ñá”ú«ªó¤œnEãÃIë¤ -gÂbp¢”lIÚ˺Ä\“ —dˆ"ú'ý­_Wåpìp¹r€ò,}¯Iõ¥>Žz&êoAìóÕ^Ü콯qèûgŲw\qÅ+U‡%Ç­‚®ØDWÜ’±¦•É›™`¨\uê‚xÿÖþåÝ=¦1¤\‘3v§Ó÷๙Èèæ÷#à’EjýÅÂvÜ¢ïJr4)Ë øáq‘{@Wõx2Ï}L·,n× 2 ¯Ž¤¡Œ¥ `m¼'c/åϨÚþ‹‚¬&Dßú¶A}Iã§,¸¯É;o‰œ¬à›ÁÞç}úÑæGŒ´ñA#Fzÿ‹Q{9Ôé·Y¨¶õÛJ¢'ûmƨßPýÎæW”9ôù~[C_Cßì·uÇŸ†Ù0G>•<Ž¿Ôoc”d¢ý6ƒ$¦Ýo³h‘™è·94Î|¦ßæaüúmÚÍ¢~[Ãgw÷Û:js³káÊõù«KógíÓOÙ;®œ½:ÿ”}ܱÏ]Y¸nŸ»ºpÙ~d~Æ>·peéÚ®ÙGç®\³Î_½xnçÂ¥³ÓÐ=½põü¼Ýtöû­#l2ÄÞsciþÊY˜‹=<^­5Î?6îøüÕk®Ø g æœ>ÝíTà2ž¯‘á––·Ôëg.מ¼rñÌÂÙyçÚ“WÏÌŸ#Ø8Wæ—ê\¸xÍ&ÚGÎ-½sîê¼ —.ž™¿r °yÖºj/]˜·î;hZœ¿Ò|°7 joà1ì ; -¬ÿ,™æÌÂâE˜diáü4}ÐYº±TU殜%OÎ]º¶`Ï]Ÿ»xiîô¥ùÞƒsöôŽ#öÜÒ»OÖµ3W/..]s®]¼äAu˜äÿ}í‚ ä -ºŽæÑU´gòÍñiôœw@ÿYè§ÐqȤltÍAÑ9¸·€.Cë3C{È\KPûíB¢cÐs”Ž¿F[d…‹0f'Œº3O÷G“ëUtîÛ9àgl¨¡ít{síA7(æÛþäéa°©øùÌÙ€¿Î?{¾ãtÔ5Gðµa.õñ¼#—Ð"¬X‡¿3ÀXï{F“çxrú®BÏ<åPVFÍóuàÞOðÙàÁlÆ.¡w®W)gz#.Ñ™ç)þ=ZŸìãyÚäûÈyúô>t®‡³yŠó›3|Ë U*Ù·Ò7 ¸‘C¹³·®» ¡qÚ=L–:OiºÐÇçp@=ÌzÏœÛÄdh;ƒž€óùþœW(ÎÓ€£#ˆª€Å¥pcÍ9À…p”´®Ãqà9ÐØKtŽ7Wœ£Tï@Gh{‰êÄ[¥u V'Ò_¤ëP\‚kOBõ>&ÿœgúÿàÞ³ô;Ôø«PŸ¯ÒoG Ba̤ˆJ@oôjžk¡ÐÞ1ÐÛÝ ‘Óè´¤ºf?Vx= R~ôù8zlo@'ÿ/Ã@f -endstream -endobj -38 0 obj -<< /Type /FontDescriptor -/FontName /e1b069+CMUSansSerif-Bold -/FontFile2 37 0 R -/FontBBox [-1331 -440 1929 1170] -/Flags 4 -/StemV 0 -/ItalicAngle 0 -/Ascent 800 -/Descent -200 -/CapHeight 800 -/XHeight 0 ->> -endobj -39 0 obj -<< /Length 1278 -/Filter [/FlateDecode] ->> -stream -xœe×ËnÛF†á½®BËtHs&Ã@‘n¼èu{stÔ’ + ß}ù½¤ikÀÆ/‰œy¾_Ã!}øôôÓÓùtß~»]ês¿ïÇéÜnýíòåVû¾ô—Óygì¾êýë+þÖ×|ݶ“Ÿßßîýõé<.û‡‡Ýá÷í÷ûí}ÿáÇv)ý‡Ýá×[ë·ÓùeÿáÏOÏÛëç/×ë_ýµŸïûãîñqßúØú9_ɯ}à´Omûütÿ¸óϼ_ûÞòÚLL½´þv͵ßòù¥ïŽÇLJ1wýÜþó‘9ç)eÔÏù6=n?[i(JKiU:J§ÒSz•2¨Œ”Qe¢L*ÊEåJ¹ªÌ”Ye¡,*+eUÙ(›ÊNÙUÊ-уÁkä5x¼¯‘×à5ò¼F^ƒ×Èkðy ^#¯Ákä5x¼¯‘×à5ò¼F^ƒ×Èkðy ^#¯Åkåµx­¼¯•×âµòZ¼V^‹×ÊkñZy-^+¯Åkåµx­¼¯•×âµòZ¼V^‹×ÊkñZy-^+¯Ãëäux¼¯“×áuò:¼N^‡×Éëð:y^'¯Ãëäux¼¯“×áuò:¼N^‡×Éëð:y^'¯Çëåõx½¼¯—×ãõòz¼^^×Ëëñzy=^/¯Çëåõx½¼¯—×ãõòz¼^^×Ëëñzy=^/oÀä xƒ¼o7à ò¼AÞ€7ÈðyÞ oÀä xƒ¼o7à ò¼AÞ€7ÈðyÞ oÄåx£¼o”7âÝþj·ùº«üo—‰$‰JI•$’$*I$IT’H’¨$‘$QI"I¢’D’D%‰$‰JI•$’$*I$IT’D’¤$‰$III’’$’$u>áMò&¼IÞ„7É›ð&yÞ$o›äMx“¼ o’7áMò&¼IÞ„7É›ð&y¼U†o•aÁ[5ñ‚·j¶oS orÁÛ4Û‚·sÞ®>,x»Â/x;³áíê·31ÞÎÄx;ãí -¿âíJ¼âíJ¼â특âíâ¬x»¯x»¯x‡+Þ!Êw¹âB®x‡b®x‡+ÞïrÅ;àà2äÍkçÎœñJ¼Y³e¼Y†Œ7+|œÆx‹ oSŠŒ7+[Æ›ețךyÞ¢oVûòæµ– -ã­Œ‹·é€"¯åþR¦W*–Ó4Xq”êC™^Jú[(£à®^ƒ1»¢y]•¡Èk¹}¼YM-x e#e®ÊÙ_y+ë¡hÜŠ7k¶:û«/ âåæZå5«dUýµl¼uz5n¥¿[ç¾ÛeBø÷&Si|y»(ô%q& %+S%ñ*Aª¾ÛÊBijPUÃ6\¥h,ž(¶+L¥ºÝ,¥‚4G©5Þ<¥ô…ÂcD› [˧©ñ¡*t› [ jj¼aijü¼IµÙxʪw¹g¶FÉ›wžØ¶¥§Rô>¶Þís¡hâ>½:­O¯VRÇËí¾³P*#àÙi|e0¼Uýí,ì*o§¿Uáûì/ãÎþ*[§¿Uê,”¦Nö¹°•m€lšx°:š&\}M³¹»)üÒðL2„´\%CHËF7æÕ§‰ÇÊ»BŽLÉ…’)æjæ€Æ»J1:%¦H߯:=Së©ÿÛ³zýr»méükÀó¹žÌOçþí¿‡ë媳ôû7±Ùð -endstream -endobj -40 0 obj -[367 760 760 760 760 760 760 760 428 428 760 760 305 367 760 760 760 760 550 550 550 760 550 760 760 760 760 760 760 760 760 760 760 760 760 760 794 642 760 760 760 325 760 760 580 978 794 794 703 760 703 611 733 760 733 760 760 760 760 760 760 760 760 760 760 525 561 489 561 511 336 550 760 255 760 760 255 866 561 550 561 561 372 422 404 561 500 760 500 500 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760] -endobj -41 0 obj -<< /Length1 2224 -/Length 1128 -/Filter [/FlateDecode] ->> -stream -xœÕ–_hEÇ3{i.M{iK´:gã]rwm"A0mÚSJóÇä %Xe¹ÎÝmr·»îiÚ±þCÔ–‚)¡/"}¨/bA!èCÔAú >øоT|*ÍùÝݹk¢–<;ag>¿ù}ç÷ûÍÎÍbDÔA¯F£“3Ùý_¯±g0;[¬é61¾ öe<]eݵëu()þüñru±ôå…gˆøuØÇ*R?qûÌåÝàïð U0ÑzKÁÚì‡+5ïdüú ö‡AÖªUÔi'bý×A¼š~Ò¦º/laê59úýÔDì–ã¶åzõ7( é[ßv¤}õìk@¾Œ®•‚½°Žøõ=Öï/Þ÷ä_¼]»‰úöåt¶9&mŽä£FúXnm™æBý{Ú\i}ÓÂ~B½AãMO¬ƒßf種·ùç°ß‰Fö eoáí-œsóÖ(ßÝ6Þ/Þ ¤ö|Ý£¹XŽ®K|“’ì¡p\ WFk;€[hÓÆøæšÿw»ÊêDï¾àNü=Ž“Á ³ÇøN ïz8—Ø«%’‰ƒ×®%Y[’»ó&_ºS -Þä - mDäºvtsm¯6ÌÎó+·R«Wf. eƒlkïÚêÚ -;ôâ?Ên4O'¢˜Ñ6XsdP¬Q/ )ŽAó‚âê¤Å[¨›·£îsŠ·Cõ‰âb~1ºûé Å uÿ¨˜Cõ³b «W£›ê늡gÝŠ9íaB±F]ì©9º${Vq wsêf§k”`ç£Û€n»¤˜Qœ}¦õ°eÅí`?„C×Ã~UŒøœsêâmŠ5êá÷-†s¡C|X1£Ýü´bNü}ÅMñKŠcÐü©¸…z´mŠÛé Ö§x;mÕ'¨SóÇ,{Ñ1ʯ·Ø'²ƒ™¬ÀÌONfÄÔôä‘ÃcØ[8²ìWug:ò–éå-§,E6=(FÄ ìl*— -‚m˜Ÿ•ŽkX¦È¤÷ ‹ì'öeÍ®ún*c§”¨âyöÈÀ@4]B7íZ¾S”¥ ]zÞC*ñjUm«ìèvÅ(Š¼Ô=ß‘îQ£Ȧg̪ÕPBS€=å -¯ç%]„‚å{Ò¥Ítâ9×—Õj˜R6D%Ã-V¤)Æ—ÊU£XY†'ÍÆ3TðÝS>Ó7Ë®îÀ?a95ž¦.ÚCEEУ2ò|Ï“ò†ªá¶qQ`»¾iü»$± ÍštþY D²ézZÖ¤4!×m[Vù…u5ÑYdãú:dP™*äá©ò, R½ qêGŸ§Iš Í€§hÖ:ŒõåÏ Ž@‰8>UIOo°òÈe"C0:˜—a–4òiæù¯8‘?K)ÊáiTvoý,,‡\ì)Èh‚,ûhìÑÚ|䨪b‹¸%UŸ µ ö£ˆH¥f½iš‡þnÞYÃïeØêg({O=Ãÿüo£†¸ -endstream -endobj -42 0 obj -<< /Type /FontDescriptor -/FontName /28d5ce+mplus-1p-regular -/FontFile2 41 0 R -/FontBBox [-115 -343 1403 1075] -/Flags 4 -/StemV 0 -/ItalicAngle 0 -/Ascent 860 -/Descent -140 -/CapHeight 860 -/XHeight 0 ->> -endobj -43 0 obj -<< /Length 228 -/Filter [/FlateDecode] ->> -stream -xœ]±nà †wžâÆdˆ°éÐ!Uéâ¡MU·€ápj@~û$J¥wº_ÿ}èçøyx‚/À?r4#p>ØŒkܲA˜pöõ¬7å®Z7‹NŒ<îkÁe.‚”Œ’¹–¼ÃáÅÆ Œ_²Åìà ‡ïóHzÜRúÁCŽ)=ô¦Ó»^xÃNƒ%ß—ýDÌßÆמDÓý-Œ‰פ ffd²ë”tN1 öŸ%nÀäÌUg&-vu{%ÅsÿÔ¨»_ùúÃG.³åL‘ÚZ–šÂ|\*ÅT©Z¿noû -endstream -endobj -44 0 obj -[290 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000] -endobj -45 0 obj -<< /Length1 6272 -/Length 3928 -/Filter [/FlateDecode] ->> -stream -xœÍ8[l\×qsîsŸwŸÜ%¹|ÜååKÜÉ¥¸¼zP”(Q‘HŠ/=–‚byE.EÚ—"W´]4Ž'@$i>šq 5Š u -äܥ줭 M‘iŒ nÁ… 䣅­ÀŽÔF­ ";çì’¦l%@ûÕKܹ3sæ̙Ǚ9g Üðˆ0>6•ê¾4øÜI’EîÅ™¹åÚ*£éwðõϬu›ióÈ_Æñɹåë7ž——„ÿÆW¾¾øÌÜàûÊy‡]Ùù|nö}íÂ2oâÛ; µH~Žº‚H7Ïß(>]ó¶XBš"Ý·X˜É…W‚?Cý¿@ºáFîéeÒ'NàxÒúRîF~ìëÍS~´O~g¹°Z$O’¨Àñþå•üò½ýÊOpLBúKÀ|ƒ´rø¥ÐUÏ‘¡Q|‡q~rõ¸÷uì½ÂÀ†oùÁyâwø;±€Ã,TÁþGãº5ø:Øñe3dSc)!J>À/ë«P_fßo‘_@ÁuPDxä3>wjÕè°&ê;}hÃá7úÞ¨ð8È(®ó\d&¯ÂéBûÞ‡ ´vyã8v¿íäó`˜%¯Ã"÷ŠÇ\ˆ*^{ÿC>A ”úÝ(Éü«¨6»Àé­öÿ硧¦mãYzvmš‚q¬š*±ìÑiÎûÜ´þO”’Õ Jâú¿RW,A…øðdö”1MP1¾P­Óñl”L'¨gS£Fô÷²oE^ŸŽ \öAä½éˆ¥r,K‡Ö¦ùÀô4ê“ãî+—T‰[M䮮߹r%BÕ¨q«™³öX¶¸ß§›©µÇõϱEþÕèTl9cèTj=Ka<»ž_Ïé é‹D£Ó‘uNM–)¶ £l7â¢Fg\ƒ»ãŠë)ªÆ®duý´1”{BÏê³×Ê*˜œ›­ŒKëëúéõ¡œ±®¯|9ƒ)§(‰þ1È3çh|¥£[ÕÑhDßZÇ0à¤3hÍ…ŠmQ.æ‰úVeqCÏOE¢”Lg×Ñ¡3ƺ¡¯ŸY7rlBy -û$¨—¥Ávû˜ ñÂuö1rO<¾ß65G'Öo³°5ÖUªgD6§¸~Dv¾DÉW`˜ÚƳ!_¶†X´¨7Bp‘ç¦ëÑ«+Y ª~ŠŠ±“Ô¡Ÿ*©ÓBŒv$5†„}¸Lê¬È»ôS ÎÏġ5ä¸QÎuj—œÖivú{Mø®` o@?°ŽÒcøw zaÚá0tÂi€ËpFà -<g!'¡š!Q¨ƒP 0ixÂ|›¿/béü‡pGø‘xM|Sj–¾'åm%Ž£28À!ü -; ˆmÛŽÝÁ¿‚’C†u¥-¢Þ£²Tüغ õ¦(lÝ•œ K1ªtSÉ[Í£žî»b™éÛ¢½µ–äº-Q))Sצ¥j÷%êÙ´Ûm™Ú’U6-M»/SÇ&U¼²b ÄpІÊ6dɈɌmßc«H¨Ævx7œ ÙK;ËÒÈvï±QÚͤᇒ¬ØnÍ“¬<ä‡2kCN—{—Cj tve|Q_‹¢vQ$>‡ðo²$¸ý®!´n¿·ý À¯IÐ!üç÷ ’Àƒ ¼¾ý.Êüxû]¡µÆË;¿ á»ûvÌF©J€˜U¯Þ+Ù$DšÕ{„`‘³j÷h-Ƭɳ܈wlYMµ>¿e«2M°šë·i¢M=™~©»A¬ -ªšd4%…L0”îÎô“ƒ=F“ª—ƒGŠu´Æ«ê}æ`wM¤³»a4Ó<ÜÓÜóÙL¼§ÿ¥ñßêh³ÍV{à3©‘µskÇÛ\žÖ‰CÃ}ŸxûسæöÀlÞùPÆý¦A#| Jv´ÚÒ%óÁ>¨ -2”tIËNªs_¿IÃ&•M(ÉŽ0KÑ+2?*ÂûdÕû}~ -<(½éîp -ÅhjÝ I¦÷`O’MÊðצGŸþƘsbâ؉±ãúøUñ€ç@ÝðÍo®þE/9þ Ùs,?Úßó»¼›ŠZ]ó<ðŒì|HþÏÉj^-I…$[Ó—× ß·ªknÓjÌU æ*ÌŠ¯jÓòx±åzéE¦Æ˜îMjx7DCÄhØ» -W!‚œ(ã”îK‘h–p1xYt{C5ÑÝáíW±³zCáꚨ‘L~:OõX–ê4y¦Ò*«[|qk‹F —õ0© ÁÄ_=£ _ÌÛ'.lýcýòÄãí]çŽÊ†±)=䯇†'&f«:¶ß$M!ï‰?¹G¢!߉®t$ðÑ^LÉo0¦!ø^9¦%‡3„Q,¶½ãée¥NQm íºG«?ÌÐÇÁtºnS'ÆÍ…qsðN¶IC·Fi#È` áC±‚ Ñî -–ä`bGM0ôÉÀxYkP?˾ì‹Fáa÷gCÛoí÷Z€ÓØ·íسœxΟ,÷m˳[ÑU¬OWó>íBo]Ø¢üxæ(ˆ×lY~fÇ.²fTåagŽbîk?{½µ·ÜvNa4÷Òü¹kヱsOIÍÿÙÒ+OLÜüó¹ '̯MeîðÚ¾sTc4üµƒYpîž‚ÈK‡onÖrð>A5¬¤†­Ú²lhQã§s¡íåBÂþº!j,ðnë8¬g°„œ}I¨3i}%îºúò™_Þªej_aGŠ }—³PöºM ìs[ÕHûaÇ…+GÌ?ºz¦«gð@ûÐÚáTÛI¬ü Ïu¾vlí¯Ì_“>ç\Èœîðª¯ó½hàY²y‰AJ, -2F!ȼ°¬ÄSÔ³e5 ß ޒdžYiA<¡hðù_–ÁH´O °ä~îшoƒxÂ-ì –›ß1Ò p;“¤-‰W‚~5Á¤€VWH¸T5¤ŒÚªÐÉ澞Ž ¶hÊ×~Þ<5XS9¨LÄ&½µ©`ÇÄÁlÓIM õÔE">ÑÙÓRÕÙÓÔxÂír¥j"5šêL·×<ØT>GЯ„ÃÐFæ¡T+”7Ù¾«Ï–Š«ƒ£ô©ÜÒª>•_Y˜»¶¸póV~G† -+×ózw²S?¤?,¤W¤p «/Ñ›èîì4-q1¿²ºPXÒ;“Ô“ÏäÌÌ\ E™$LTç‹ÅåC©ÔÌÄ­¥…™Âl>¹Z¸µ2“ŸcF$—òÅÔùù…UÙ¥OæŠOåVò:2fòK«èç­¥ÙüŠ^œÏëSgFô±åüRYx¤,×wMéJv%\Ye.S3SX^@%ÅÂõ<ªXÑŸZ(Σ2a1Ò—s3Oæ0 KúØÐH²øt1îÈ-Ͳ™¹ÅÕ‚ž[Ë-,æ®-æËsúÐñ =W<¤WÜZYYX.®&W“èP -•üö¼á` Ö û­ˆpïÿ×ð€Ç‘?‹Ü<§.Bá8ä[@yæp¬7;2YÎaºŠ°ŠZGñ÷¹S\~•cl…”Cý‹ˆÝ„[ȪÌa߸Žë ‰¿òu8„ïïÒ¤BWyFôAzñíF-X_ÿùÈ*ÒÌ.ç'!S±gí,Â2®’¿ô=s–Pv¥gqfg·‚œì_ÚÆ^Ç>Ë^w×æã2âeKŠH]ç>ÍWìy -ÇU¶¬> -endobj -47 0 obj -<< /Length 1278 -/Filter [/FlateDecode] ->> -stream -xœe×ËnÛF†á½®BËtHs&Ã@‘n¼èu{stÔ’ + ß}ù½¤ikÀÆ/‰œy¾_Ã!}øôôÓÓùtß~»]ês¿ïÇéÜnýíòåVû¾ô—Óygì¾êýë+þÖ×|ݶ“Ÿßßîýõé<.û‡‡Ýá÷í÷ûí}ÿáÇv)ý‡Ýá×[ë·ÓùeÿáÏOÏÛëç/×ë_ýµŸïûãîñqßúØú9_ɯ}à´Omûütÿ¸óϼ_ûÞòÚLL½´þv͵ßòù¥ïŽÇLJ1wýÜþó‘9ç)eÔÏù6=n?[i(JKiU:J§ÒSz•2¨Œ”Qe¢L*ÊEåJ¹ªÌ”Ye¡,*+eUÙ(›ÊNÙUÊ-уÁkä5x¼¯‘×à5ò¼F^ƒ×Èkðy ^#¯Ákä5x¼¯‘×à5ò¼F^ƒ×Èkðy ^#¯Åkåµx­¼¯•×âµòZ¼V^‹×ÊkñZy-^+¯Åkåµx­¼¯•×âµòZ¼V^‹×ÊkñZy-^+¯Ãëäux¼¯“×áuò:¼N^‡×Éëð:y^'¯Ãëäux¼¯“×áuò:¼N^‡×Éëð:y^'¯Çëåõx½¼¯—×ãõòz¼^^×Ëëñzy=^/¯Çëåõx½¼¯—×ãõòz¼^^×Ëëñzy=^/oÀä xƒ¼o7à ò¼AÞ€7ÈðyÞ oÀä xƒ¼o7à ò¼AÞ€7ÈðyÞ oÄåx£¼o”7âÝþj·ùº«üo—‰$‰JI•$’$*I$IT’H’¨$‘$QI"I¢’D’D%‰$‰JI•$’$*I$IT’D’¤$‰$III’’$’$u>áMò&¼IÞ„7É›ð&yÞ$o›äMx“¼ o’7áMò&¼IÞ„7É›ð&y¼U†o•aÁ[5ñ‚·j¶oS orÁÛ4Û‚·sÞ®>,x»Â/x;³áíê·31ÞÎÄx;ãí -¿âíJ¼âíJ¼â특âíâ¬x»¯x»¯x‡+Þ!Êw¹âB®x‡b®x‡+ÞïrÅ;àà2äÍkçÎœñJ¼Y³e¼Y†Œ7+|œÆx‹ oSŠŒ7+[Æ›ețךyÞ¢oVûòæµ– -ã­Œ‹·é€"¯åþR¦W*–Ó4Xq”êC™^Jú[(£à®^ƒ1»¢y]•¡Èk¹}¼YM-x e#e®ÊÙ_y+ë¡hÜŠ7k¶:û«/ âåæZå5«dUýµl¼uz5n¥¿[ç¾ÛeBø÷&Si|y»(ô%q& %+S%ñ*Aª¾ÛÊBijPUÃ6\¥h,ž(¶+L¥ºÝ,¥‚4G©5Þ<¥ô…ÂcD› [˧©ñ¡*t› [ jj¼aijü¼IµÙxʪw¹g¶FÉ›wžØ¶¥§Rô>¶Þís¡hâ>½:­O¯VRÇËí¾³P*#àÙi|e0¼Uýí,ì*o§¿Uáûì/ãÎþ*[§¿Uê,”¦Nö¹°•m€lšx°:š&\}M³¹»)üÒðL2„´\%CHËF7æÕ§‰ÇÊ»BŽLÉ…’)æjæ€Æ»J1:%¦H߯:=Së©ÿÛ³zýr»méükÀó¹žÌOçþí¿‡ë媳ôû7±Ùð -endstream -endobj -48 0 obj -[333 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 500 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 517 444 305 760 760 760 760 760 239 794 517 500 517 760 760 383 361 760 461 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760 760] -endobj -49 0 obj -<< /Length1 13696 -/Length 8659 -/Filter [/FlateDecode] ->> -stream -xœ½{ip×yà{}LÏ=Ýs˜™îi\Äœ˜0€€À )J‘$ê€HðH R’-J¶ÑfÙŽåuy“ÔFq´‘Röf£×HVÖ»[ˆíJy7»kÙµE»¢¬íMi¥§be - Üï{ÝAYΦòcÁê×ï}ïõëï>^ %„ȳD$î9\­ùÉÓ¿AèÑçg^øîp…á ÀÂ'..ëâ£ÂÛ„¸t˜8µpúü§¤ò1BÄÂùô¹'O=xæW éÝ{fnöäOo>‘ƒµ -×и"ôì…qç™óËO -±™‹Ã=±±A2•ÎtdI.Ì4 -f'éêîé%ÛúŠÿïwÿÿùc¤ÄHd?ë;0Íî¼8ÈyG’¹ŠÓc3öôŒþßT’eFKúŸ0±Ì„ÒþCÓ“æŒQfbélRg¦ 61SfR 5Lã©éÿ‘ù¯3X7ý^æ§3Ó`rqšM]œá33°Ÿ\ -¿¯Ì\%«@? o×?yüx†ØF)Y4± r—šު–™§¤?/ù&l£3±kŸ©3©ûNFL_»:«cg8c3™«|tÈá ½6vjF5`G_Iÿ.'Ç_Ò«L)ŸÖõ=æÔì£ú´~ò{ \À7ëõ«úž«S³æUýªÉ_gâælV}`s8€g‚üMc×’†‘ѯ]6ÀCû›#n_*™ú5çå¦>½ÿpÆ`tfú*´Ï¼jêW÷]5gñû¼•™ŠbÞ€ðû¸Š7söч·R‚FJ@ÄÕ+ȶ;OšW¦˜ÞžY›)ƒ¼Aoþ -£Ÿ!û™ûÀ´Eégg¬)äSA¢‡ óìL¨:>M˜¬O2±¸›)úä¿#2ý?D(ò±KŸl+Y{@™ÛYå¨{D(ZâÔEzñ!Ÿð7dâÞÔæÐ ÜÏÜÜ\0s³pó"Øâ«Â îdÈ´ý¤÷;H“ÁŽ’).’$9ÒA¦É6Xñ"üû>MÒãô÷è_ Aa»Ðþ^¼K|Gz^ú”ÿÖuÙõM%©\vr·=Ÿ÷Ö¼_òÅ}Ÿõý™ÿ¨ÿþŸú|8øJH}V=©~W»7ÿud5ò£h'ì./xA¼¶H\Ä%DþÌ©ºªÈä^©ÈÜuæ­2c•ò1ej•‘k–ä¾Î\u&©VY¨n‰îë–Ý¡´%ù¯HLªÈÌ¿f)Ê ‰…Ö,—ûŠÌ܉¹Ö¬ rCfÞ5æRWd—;R„I7ì´"KþHQF°g¬À@ !Ø«®ø¼A{ƒ|µÏ^ àÀ&Vp5ùª$»<¾@0TqþèWeÛñ6 l"MI­¿©ZWDŒ>š¡bDóÒë»^zɤ£/þæo¾øì¿ö -ó^àÒÈ{¯R]Xÿ9õ­cý/…p«?¤{hY8þs˜´Á»W=2QKriUv“©èÜ€‰–\ -Ó¢¥-Ìļ?20TÇ¢JÔ,t÷t÷ tÿ0Nû3Q$ -TØDãôû…Þ£»Ê¢› +·nþ[º*¼Nò #”é\a÷õv˜x‹+$œð-ƒùÎãb÷ -!¤±h<—Y¨ˆVqlwÃë ‰Šàr:~b*òy»ÇŠ´¾çÇvåƒ^Ù­’ÔlÏ-|¥?ô†ò»¦Ÿ#üýÓðþûáýè)óU­¨|½í‹â»}yP^¨2÷5+ „šÄÀ¸Ð§ƒÝ=ðæÁ¡F='Fƒ"Ð;ívûééQ¯è‚~ŽBqQ Ž¸üŠB‡_=9·:¤ˆŠ"»UÄásÏMï̇.rþXCº°x•…–(_g¾z;.6ñˆ§ØãØD,YeÑkLª[šï:óÖÛ"=¡aYTÃn”ïR - -ÒQéŒêÀF%Ö¨G3-4ÍBW(üJÚ§Ñß1ò¯ ÑßÕi¿W{ÙPµ—èˆñ2â÷ÆÍ‹ô 9Gâdi+èF@&Pƒh•²D•y¯­ªn¢JE+ ¯T½Z¸-ÊÑV«E,%ʵƒ4KV[-‡‡ ¹nÎ>)…ÁéBÁå÷Æ󹤱m°©ùÕ…€êÛfJ¹d\ìéߟScä?Ð×È;Y%sW-A¾ŽežªååhõXÔ4Ž4«ëë?ú™4—õ›Ðl<à±Ú¤Aä4líSôôÚªË#çfùœmt]…7󉸞¯}F>3pïQУï€eÈ¿=îàz¬W ƒd¸*£d¬¬íR\ŠPaÊš‘o0u êŠ((‘âŠÌ[¶,¢®h†alÛ°C…é -já“[ÌÕ"A”UÍ¥„#•Mçð:Â\Jäv×hÒZs\â$(4”é£zD—ì–|!÷ú¿‘I–É«¶¼aá Y’I úåÈ{{¨[T$YA‚:tóïè÷„Ljd°Û椥‚Ý Ë°â(Œ$§ßôûU+BÑù]çºñkaKô ZÄUèº*Ddƒ±€Q3š£úXÁ5ôÄÌ‘à5fL•Çï=öÄ1Z¿üµ×~®_xêÈo_xôsŸÞŸ¼ø$A;Ú!Ä&¦ã©·]èÄDÈ(ÚJ’B£±<à_<Ü\<~4ÊÍ…yTKs’ªV$n ÐÜG°1Ôc\QÅ( ŒR Ð=oìéõ†Ó³…Üú»†ñÈÀ N¯u(ÇãÆ{‘IÑLš¾Ñoû™h¾8ö“?"íâhÇÌbaöYd^`Á,¢L ‘׫,}‘:ªR[Ks=Šžš -ÉX¼ƒÕ<ÖëVÃV,³p…™VX³„à N T.Љ·2¶ E®-Z$¡"‘ׄ‚!fS} -æ­!W«SNdAjA::°¤9ÐW@JfÁ¾Ø–݆=‡¨ ªg)š¶QP;çj»r]A)_B ¦æ>ü̳H¥<ÍJ¡àúoÒPþpïh¾¨wÅ&ö쌉q·Äwž9sxá±/…ô®œ7àux ú÷}È;êä·^€oä% ˆÀKyÙ#__í‹Ôt}LGúÕ‘$²µlµz§½œ§½žâŠ§7í.®‚cÙ^ 2ïµ=Üz=.àz¯j© ÃeÐå¸{ÒÀŠdËR{µðJ ÞWëLƒ"€–[¬Oc½œMŽjÇ6õÚ,³0¨OsE»¥ÿ© ÷dÆ{&¸s{ïH2åÍ‚WTh:,ü}kòȹ=xhéàŒê ‹^=wldøX$1iÈ’Í“ùãü¥»-Ï€rŸM|vŽÜs›×n§Eèç—¯²ä5ˆ!«¶Ós¹4v$6QD9mLöóáa ÿàÑ—§r•|¾ò þ O×û ”«1ð¿ÐÓ_¿#iŽØ’ÀŽ£U4 -AÑJp7 š7ä¸x°Ã^WÈ¡— -ùõg “>Õ‰÷«n1“¶2ÜýÃþPžŸ ¤•ö$i¬ƒÇÙv¢ÐU¯×Û^tå¢7Pd•†Õ ¯7ëh†&*W˜yŸ²&[]nHkkV­É`¡buÕÜ0”­˜ž¶&1Q]‘DîÓyëÂVÓ] k GxåmŒ·qlYM]1kèt©+]5€—±ExáNô#¨ Û¿?"´amxÇ–©H‹E[àÉXB†BC´Ö/ÉZ8¯lù£QçPw•+´äýé&Xù8mFÆ)X{NŽEr4²H¹‡ì1gÔl=J¦ütQÖ¢·ßí‚XóaÁ#¤ }Q7’Ù/ç ·¤ÿίîAbþ¢Dߊj‚"ùä°¾ÍÓéIgÒ™|$“þ_ÔÌ=‹~ üÀÛàSKä ¶4­4ˆ2]DCOwa¾âdž†pŒ§Q± ÚC¼•×™´M#Uo‚&·Ã`ÿr#—±@©ZX!£!ÄÄt ©ÚŠÖ‘ÍaW ³|‹ù5ûˆGX!Tî…HT°Œ9úAÈSiOlăhŽËÀ2nò½Ÿ>5µÃãzÓyͯÈÁµ= {H” -’Ëãò—ϽJ¥7“j óó†[ {ä ö/v H’ävÉbvÏ£Ü/†æÀ/&ÈÒŽ#? è1µÞö ?”¡¬óØ,^³Tp~*g‰ -™%ó×ۢʓK ƒJƒµ#_ÍÜ°²"åáŒí°±ü¸®Ón]_~<¯¯¿e@¤»‘±­+é ÇÅ„æ:à’|¶Þ‚‹À=®Qìv´:Þ†æÛpbªŠ5™•ÿ›½=ÎJÅÔ˜Ü"·‚i”*yý+FnçzP|JOgè Р]W>¤çgMÛ'äÏÿ xv€÷9MÚSKŠ9îÀr‚õÕh6^ëʺ«LºfÅ@i²õvL⊥-Å°+!ºPhÆT,]¬  ÛƒÙ„+ ‘!ÈcB£ÖØÀ›W4C‰Ø1¢HszOžû¾S±Ò§: ú§ŒOœßç=;mBæwwŒw~ õƒåþ»oÅÅÇÈ$¹NÚ#èÝv7,/è¾wñòbhL¡IäV À¥‚KhSUç™F¨’묪"Ò,S· -è×4« „ì±Ý`D½Â"H\­ºûX{ª -ý-Þ' †©haÇÏÜ6âùE5ŽyáîV‹õh+ÞÔÈv´­@˜u¶XA #–w¸·½ÅJšoºT˜e—(qð:Áµ‡›J ž²Ù¶'ö8‰J"K¹Ùõ¡È…B”JÇöŸÓÊ“góz\ÞLÞëw‰¢Ð±8´¯–‘Ÿß·ÿÔÑÓ»)8-Éå–3™Ïé!·otÇÞ’9î—r.õî掇zÙ|!/ù½QåìX_WÍ+äµGš»mHªË/»;pDß+9ÇNÁo±PÝN ³ÐÔÛYnY³R0“gÔ9ENµ2 V!°”Pׄ°4Ì„°›-d!ÕòcQâê´Kgn¿ÀЯžA°;é°“X“wÓ:ý5=KsôscZP]Ï.O;è_.åsë -$hãP¥“£'ÈÝFÚ-Ô³>;ŠvÕEý¨c©†¥(͈²ÑªµÝ ¡Q ¡,_i5´,¯ZQ톥Ԯ0¥Âº ,R­0¨U—jÕ0Äò8ÅØ©¨+.¥†!`„Ánn†A¸ Á uEoäq)À -[1±Å'ûí'aI—@ðŒÞÒÑ6ÀPUõVÛ¢»f« ¯Ä^‹|UrEô‚Ù_¿1UQÂâ+šxW­¿ÞøG"ed¨J7½U‘&š[Š+Oã;S…Ìó²üTN׌|hæ°ª ºž]”åg½r‡P=tžÖ 2’½¾b¨|ùehÒ?ôy¯‡½8þîw7ò Ñ›ïB~TÈùuÒ.¡»V]ÆR¢‡ÔÐò€è°çaÍê€bÞ¬9–æÚ­²µ¼È  È9j{NàNw]5ò¨:4Fƒt4=öôt´sO9I¡ÆŒïŸÌoërer`PÔÁá™ÉÆÇÊÅC.úØsO¥*•¬PM]’G‹'µÞ\ÈØYþÏ· 4>ïÝÓÔ·åtÙ{\hpÞ…n¾ 9Ún•´a÷ôM-žÇïµu×®œ¼roH·*ç”7!‚ŧÐâóU¬!¸dXJƒB-˜¶¶Š™Á¶tPO Äxô£Ó3gÂõÿä?ÑêÞ€’ðáCì ‰àÅÚÑgV?¾8õíÙ_èH$ƒÿð©Ýv|,BóšpšlC©Ë.I¦ÞV$ç¤IUcž" -€1‹6P@,¦¯Êº¯1±neÐCÕÛ™nlsð@7šÝh„~Þj9jYDµÌÜ:+µÓ›¤%K·²` ·ÛN?‹…ùû'Œ©¦àV ß(èÏŸ×ÕÐúóÆùëô ™¡;v ©ñnjnkB‚üµ´O¥PRüVÉõ¯A>CÚuÔ¾l“Ïv²PÆ8º‘·£¢}žAÙÏ’Ëx°Vo—yŠP®¡ÜÊ[岊§ÕXàØ–;‘ÌÀ3É›tPÌA˜5Ë@jÔ²G€NØñ¼þÖíiµwççäó ç`q…ŒfN?¬SÈF÷ êÆ+z®ÙQŽô3…×™tA ¿'iÒÈÁ}ocDº^«óc}¯ÑûÍ›ø-˜~]xD튎Ðãa)s}ƒæ¿Ÿ¢xöÙ¨%·DYŒó%8S*2%°µ¢‰ÿbÉ°QÎŒŸúx¯þ»ú§sÆ:Óm\õ÷Þ±ã%ÿÜ,è -äÕHÚ»ñýùþ—Ôjº³44Þ™¨Û>ßðLrõò¬gŠ£5îåÙÝ8Çl|b6¾‰ -ㆇ2”ìÝúcЭä]ç|`¢a¹A n~>à–ç’™ið#ó>~œÑg";QëX¸nmçŸÊ­²ï:Û®r2 ZñåzÝEkÈwÝÚeWy¡Àª°ÀšU“o°ÆT¥+ÁPÀ>Çì¯5 2«cÛ†þ–r«ÞjÃ*è‘×`¨¶YjÑ÷ù©A¿;K-«w;¨Œ;Ö‡*ãÓÚ‰ŒÉ?<º#ƒ>Í -ÁT,Ìö‘÷x\g*B]WlÔa^Yíl3'Ú§X8»Ä^_,áîxtÏÎáɇcÇžÙ]’d¿;M¥ ÏEß»'3õÚ„ÖϱSôÞŸÇåIg/DÂé¹R®£Ë¯*Ù©òTÊãv뇼JÈåõÄi«3¥¥‚ämôßS*¸$”U -dõu¡r ÐÃNÔÃ:ÃNîIJyÏÎ{ðX­g([² 00ü:æYP¿w‡7óŸ°¬`„˜ü¤Ãž¸l&?rÄÉy¢9ÁÎCíï:.“— )ãDV̶óù¶ìêЛY!ÿÀöOÏšöÜg÷¥3ëïú#É`Þ tÂ#£OSü­ÖPïÑWI?¹LÚ½¨‡FQ´+NV²ZƒÙÿV¹f¹ÁW܈w…ÇLvqÀ.69AàÎÛá^“€Uâ=õ⇋^p/ùnКÆÜ-&`’·Q]јC–·¥< ä¿1@bCÆ9Ø•«ß!Ñá¬~HÊë‡CA¡_ÏpGHÊÝ~¿wý/ÔpÊ?hx"ßg³þåL:è÷÷ERÝI•J¶ŸÙàÁ(ùŽý ‹6ñ©†ÕB›ŒØ'Û«Ö˜mR­Á+¬Uaƒk–Û{ƒyÖ¬Þî+¬·Âº×,á5+Ác9K »a$[yõ†Äô5™µÔ•¡Ö ˜\“·Ãض¡ÝbxCP·È„×£†õîÞæðàÐÖïA7À#y>ÓÚ:µyÂaµjÀâ`ßfъ籑qÚ䦖ˆ˜ÝvE¦Š.^·ã¿ UpP‘ÅdðÇs¹ñQ©*z“º»ÇÓϩ곆ÞéÊÑL$¦ÈY,ËÞŒËP\‚8¬„1ÁÆ3ÝoD]Žè넪)åS!Ï~IÏíM‡ŒTò‹ž°,\¥ÅN›÷x~õð~'“öòÒ¶QùúŠ WZ˜Ìa¾Íô†•Ââ(ÓÅå°«jí¶å g®0¹"±¾ŠÌdÕ Ý`}ª• Ý°|-.!ßär7XjM‚ù—œž+ØʲW<¾V¤KÝ8é±Á)u%*8Uâàô&¸¸ …ÁhŸ³ÉÈæ&ApÞ³Ež -þ6@ñøÒ}ÅÒˆk«(˜JgÜ•] Jç€í¡ 2â©ÎÐY@b4ßñS«ÍC+ÝtÇú»3žó^-£k¡§ zô¾£Q£ð´žÏP!äY|ÆÔó¹g #zꑸ¡_†ä(qGÁ÷¼/\ZÂLÑ÷‹>¼ò“Îo8’7³ôç »òw¤&v]Ôîázô ¿¨ñÓ¬¶ÛŸà²¬²ìµUÓMz!òÙR¬÷_aõ -ë·­É»fýW˜ÄZÄOx^°œ$XŽDäö«Ý^7+ª+Þ¢?R”VJx[)ó¶ÂÛ*¶mX»åLVm‘H¹Å*-VE¹x¼Éj±TÞjb·ÇëO¼~ÛÄf¬3ñëA׆sæLŽ$†"[Ä!ný«@x—›C=¢YH—uÝ<sÏuŒgòfáKõ¸ªÅdÅHFC!wHäÏw`L©õw[:Íò¯9c¿Q4´X"¤øEo"ÚÛ¦øBnAÞ"í§.ÈS+ø;üõLR*:7Ê|üc.7Ú7<”&Äy>úV*¬¥ÓZ$ùV8•êL¢YÚ5 È7ŽÙol#ÿ‚êÉ'nÔz žnñóožk…øÁEh#»„ŒËëá?CºUîáIÿÖ‚ïñ±S— ë_4æ ùõoãÇ¡I:¤¯¿b×|øÃáŸlþ¶»¸QB‘݈LƾHÉ.§/yÂéË$H^pú.’&¿µùKçù§ ßnÚý IÓ¯ÀnTÂ߀¥ßtú”t i§/¯0áôEò¨p§Ó—H]Xuú2é~ìôýd·(8ý}QœrúA2 ½½kþÂŹÅ幓ú#Oê;.œ\œ{R?ZÑÌ^˜¿¨ŸZœ?¯ß;7­Ÿš¿°¼´ëî#ú½O.Ì]Z<»<·ð'–ï:{úÌòLNÍ/žžÓë•š>¢À:/„¹þáòP¹^«µ~颣s‹Kgç/èµJvžj4EX}kq™¯;³¼¼0R­ž8_~üÂÙó'ç*Kó/ž˜;…˜T.Ì-Wï=svIGäôÃó§–/Í.Îé8wöÄÜ…% ÷ñ 'áÅËgæôÃûîÒïY˜»`/¾Ë^PÒ70é¯ôW¼|3çYÜæÄüÂYØdyþôl±¨_:»|6ƒ䕾0{â±YàÈÙ ú=SwU–ŸX.yg/œÄ'gÏ-Íë³gÏž›}äÜœýà¬>µã >»<¢;d-X<»°¼TY:{®Ua“_>š7O.‹dŽ,’ehO·cÄQ‡]*¤éà†ó˰ϩ¿À2yV…þr?9N¾A¾GÈÿÁj— -endstream -endobj -50 0 obj -<< /Type /FontDescriptor -/FontName /8a6373+CMUTypewriter-Light -/FontFile2 49 0 R -/FontBBox [-203 -390 729 1045] -/Flags 4 -/StemV 0 -/ItalicAngle 0 -/Ascent 775 -/Descent -225 -/CapHeight 775 -/XHeight 0 ->> -endobj -51 0 obj -<< /Length 1278 -/Filter [/FlateDecode] ->> -stream -xœe×ËnÛF†á½®BËtHs&Ã@‘n¼èu{stÔ’ + ß}ù½¤ikÀÆ/‰œy¾_Ã!}øôôÓÓùtß~»]ês¿ïÇéÜnýíòåVû¾ô—Óygì¾êýë+þÖ×|ݶ“Ÿßßîýõé<.û‡‡Ýá÷í÷ûí}ÿáÇv)ý‡Ýá×[ë·ÓùeÿáÏOÏÛëç/×ë_ýµŸïûãîñqßúØú9_ɯ}à´Omûütÿ¸óϼ_ûÞòÚLL½´þv͵ßòù¥ïŽÇLJ1wýÜþó‘9ç)eÔÏù6=n?[i(JKiU:J§ÒSz•2¨Œ”Qe¢L*ÊEåJ¹ªÌ”Ye¡,*+eUÙ(›ÊNÙUÊ-уÁkä5x¼¯‘×à5ò¼F^ƒ×Èkðy ^#¯Ákä5x¼¯‘×à5ò¼F^ƒ×Èkðy ^#¯Åkåµx­¼¯•×âµòZ¼V^‹×ÊkñZy-^+¯Åkåµx­¼¯•×âµòZ¼V^‹×ÊkñZy-^+¯Ãëäux¼¯“×áuò:¼N^‡×Éëð:y^'¯Ãëäux¼¯“×áuò:¼N^‡×Éëð:y^'¯Çëåõx½¼¯—×ãõòz¼^^×Ëëñzy=^/¯Çëåõx½¼¯—×ãõòz¼^^×Ëëñzy=^/oÀä xƒ¼o7à ò¼AÞ€7ÈðyÞ oÀä xƒ¼o7à ò¼AÞ€7ÈðyÞ oÄåx£¼o”7âÝþj·ùº«üo—‰$‰JI•$’$*I$IT’H’¨$‘$QI"I¢’D’D%‰$‰JI•$’$*I$IT’D’¤$‰$III’’$’$u>áMò&¼IÞ„7É›ð&yÞ$o›äMx“¼ o’7áMò&¼IÞ„7É›ð&y¼U†o•aÁ[5ñ‚·j¶oS orÁÛ4Û‚·sÞ®>,x»Â/x;³áíê·31ÞÎÄx;ãí -¿âíJ¼âíJ¼â특âíâ¬x»¯x»¯x‡+Þ!Êw¹âB®x‡b®x‡+ÞïrÅ;àà2äÍkçÎœñJ¼Y³e¼Y†Œ7+|œÆx‹ oSŠŒ7+[Æ›ețךyÞ¢oVûòæµ– -ã­Œ‹·é€"¯åþR¦W*–Ó4Xq”êC™^Jú[(£à®^ƒ1»¢y]•¡Èk¹}¼YM-x e#e®ÊÙ_y+ë¡hÜŠ7k¶:û«/ âåæZå5«dUýµl¼uz5n¥¿[ç¾ÛeBø÷&Si|y»(ô%q& %+S%ñ*Aª¾ÛÊBijPUÃ6\¥h,ž(¶+L¥ºÝ,¥‚4G©5Þ<¥ô…ÂcD› [˧©ñ¡*t› [ jj¼aijü¼IµÙxʪw¹g¶FÉ›wžØ¶¥§Rô>¶Þís¡hâ>½:­O¯VRÇËí¾³P*#àÙi|e0¼Uýí,ì*o§¿Uáûì/ãÎþ*[§¿Uê,”¦Nö¹°•m€lšx°:š&\}M³¹»)üÒðL2„´\%CHËF7æÕ§‰ÇÊ»BŽLÉ…’)æjæ€Æ»J1:%¦H߯:=Së©ÿÛ³zýr»méükÀó¹žÌOçþí¿‡ë媳ôû7±Ùð -endstream -endobj -52 0 obj -[525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525 525] -endobj -xref -0 53 -0000000000 65535 f -0000000015 00000 n -0000000240 00000 n -0000000441 00000 n -0000000505 00000 n -0000000556 00000 n -0000000828 00000 n -0000006166 00000 n -0000006558 00000 n -0000006600 00000 n -0000006648 00000 n -0000006740 00000 n -0000006783 00000 n -0000006951 00000 n -0000007120 00000 n -0000007293 00000 n -0000007465 00000 n -0000007596 00000 n -0000021474 00000 n -0000021887 00000 n -0000021931 00000 n -0000022107 00000 n -0000022205 00000 n -0000022380 00000 n -0000022498 00000 n -0000022572 00000 n -0000022710 00000 n -0000022865 00000 n -0000023304 00000 n -0000023364 00000 n -0000023638 00000 n -0000023912 00000 n -0000024182 00000 n -0000024452 00000 n -0000032294 00000 n -0000032505 00000 n -0000033859 00000 n -0000034774 00000 n -0000043523 00000 n -0000043739 00000 n -0000045093 00000 n -0000046007 00000 n -0000047225 00000 n -0000047439 00000 n -0000047742 00000 n -0000048879 00000 n -0000052897 00000 n -0000053119 00000 n -0000054473 00000 n -0000055387 00000 n -0000064137 00000 n -0000064353 00000 n -0000065707 00000 n -trailer -<< /Size 53 -/Root 2 0 R -/Info 1 0 R ->> -startxref -66621 -%%EOF diff --git a/src/zc/Zcmd_footer.adoc b/src/zc/Zcmd_footer.adoc deleted file mode 100644 index 8fdcd87..0000000 --- a/src/zc/Zcmd_footer.adoc +++ /dev/null @@ -1,12 +0,0 @@ - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zcmd (<>) -|0.1 -|Development -|=== diff --git a/src/zc/Zcmp_footer.adoc b/src/zc/Zcmp_footer.adoc deleted file mode 100644 index b0d3d4a..0000000 --- a/src/zc/Zcmp_footer.adoc +++ /dev/null @@ -1,12 +0,0 @@ - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zcmp (<>) -|{version-label} -|{lifecycle-state} -|=== diff --git a/src/zc/Zcmpe_footer.adoc b/src/zc/Zcmpe_footer.adoc deleted file mode 100644 index 1e7ba38..0000000 --- a/src/zc/Zcmpe_footer.adoc +++ /dev/null @@ -1,12 +0,0 @@ - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zcmpe (<>) -|{version-label} -|Stable -|=== diff --git a/src/zc/Zcmt_footer.adoc b/src/zc/Zcmt_footer.adoc deleted file mode 100644 index 5206794..0000000 --- a/src/zc/Zcmt_footer.adoc +++ /dev/null @@ -1,12 +0,0 @@ - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zcmt (<>) -|{version-label} -|{lifecycle-state} -|=== diff --git a/src/zc/c_lbsb_imm_offset.adoc b/src/zc/c_lbsb_imm_offset.adoc deleted file mode 100644 index dd7bf83..0000000 --- a/src/zc/c_lbsb_imm_offset.adoc +++ /dev/null @@ -1,8 +0,0 @@ - -The immediate offset is formed as follows: -[source,sail] --- - uimm[31:2] = 0; - uimm[1] = encoding[5]; - uimm[0] = encoding[6]; --- diff --git a/src/zc/c_lbu.adoc b/src/zc/c_lbu.adoc deleted file mode 100644 index 06ad04b..0000000 --- a/src/zc/c_lbu.adoc +++ /dev/null @@ -1,48 +0,0 @@ -<<< -[#insns-c_lbu,reftext="Load unsigned byte, 16-bit encoding"] -==== c.lbu - -Synopsis:: -Load unsigned byte, 16-bit encoding - -Mnemonic:: -c.lbu _rd'_, _uimm_(_rs1'_) - -Encoding (RV32, RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 2, name: 0x0, attr: ['C0'] }, - { bits: 3, name: 'rd\'' }, - { bits: 2, name: 'uimm[0|1]' }, - { bits: 3, name: 'rs1\'' }, - { bits: 3, name: 0x0 }, - { bits: 3, name: 0x4, attr: ['FUNCT3'] }, -],config:{bits:16}} -.... - -include::c_lbsb_imm_offset.adoc[] - -Description:: -This instruction loads a byte from the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. The resulting byte is zero extended to XLEN bits and is written to _rd'_. - -[NOTE] -==== - _rd'_ and _rs1'_ are from the standard 8-register set x8-x15. -==== - -Prerequisites:: -None - -32-bit equivalent:: -<> - -Operation:: -[source,sail] --- -//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. - -X(rdc) = EXTZ(mem[X(rs1c)+EXTZ(uimm)][7..0]); --- - -include::Zcb_footer.adoc[] diff --git a/src/zc/c_lh.adoc b/src/zc/c_lh.adoc deleted file mode 100644 index e89705a..0000000 --- a/src/zc/c_lh.adoc +++ /dev/null @@ -1,50 +0,0 @@ -<<< -[#insns-c_lh,reftext="Load signed halfword, 16-bit encoding"] -==== c.lh - -Synopsis:: -Load signed halfword, 16-bit encoding - -Mnemonic:: -c.lh _rd'_, _uimm_(_rs1'_) - -Encoding (RV32, RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 2, name: 0x0, attr: ['C0'] }, - { bits: 3, name: 'rd\'' }, - { bits: 1, name: 'uimm[1]' }, - { bits: 1, name: 0x1 }, - { bits: 3, name: 'rs1\'' }, - { bits: 3, name: 0x1 }, - { bits: 3, name: 0x4, attr: ['FUNCT3'] }, -],config:{bits:16}} -.... - -include::c_lhsh_imm_offset.adoc[] - -Description:: -This instruction loads a halfword from the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. The resulting halfword is sign extended to XLEN bits and is written to _rd'_. - -[NOTE] -==== - _rd'_ and _rs1'_ are from the standard 8-register set x8-x15. -==== - -Prerequisites:: -None - -32-bit equivalent:: -<> - -Operation:: -[source,sail] --- -//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. - -X(rdc) = EXTS(load_mem[X(rs1c)+EXTZ(uimm)][15..0]); --- - -include::Zcb_footer.adoc[] - diff --git a/src/zc/c_lhsh_imm_offset.adoc b/src/zc/c_lhsh_imm_offset.adoc deleted file mode 100644 index 20f1b2b..0000000 --- a/src/zc/c_lhsh_imm_offset.adoc +++ /dev/null @@ -1,8 +0,0 @@ - -The immediate offset is formed as follows: -[source,sail] --- - uimm[31:2] = 0; - uimm[1] = encoding[5]; - uimm[0] = 0; --- diff --git a/src/zc/c_lhu.adoc b/src/zc/c_lhu.adoc deleted file mode 100644 index e6193fc..0000000 --- a/src/zc/c_lhu.adoc +++ /dev/null @@ -1,50 +0,0 @@ -<<< -[#insns-c_lhu,reftext="Load unsigned halfword, 16-bit encoding"] -==== c.lhu - -Synopsis:: -Load unsigned halfword, 16-bit encoding - -Mnemonic:: -c.lhu _rd'_, _uimm_(_rs1'_) - -Encoding (RV32, RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 2, name: 0x0, attr: ['C0'] }, - { bits: 3, name: 'rd\'' }, - { bits: 1, name: 'uimm[1]' }, - { bits: 1, name: 0x0 }, - { bits: 3, name: 'rs1\'' }, - { bits: 3, name: 0x1 }, - { bits: 3, name: 0x4, attr: ['FUNCT3'] }, -],config:{bits:16}} -.... - -include::c_lhsh_imm_offset.adoc[] - -Description:: -This instruction loads a halfword from the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. The resulting halfword is zero extended to XLEN bits and is written to _rd'_. - -[NOTE] -==== - _rd'_ and _rs1'_ are from the standard 8-register set x8-x15. -==== - -Prerequisites:: -None - -32-bit equivalent:: -<> - -Operation:: -[source,sail] --- -//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. - -X(rdc) = EXTZ(load_mem[X(rs1c)+EXTZ(uimm)][15..0]); --- - -include::Zcb_footer.adoc[] - diff --git a/src/zc/c_mul.adoc b/src/zc/c_mul.adoc deleted file mode 100644 index 5ab6aeb..0000000 --- a/src/zc/c_mul.adoc +++ /dev/null @@ -1,50 +0,0 @@ -<<< -[#insns-c_mul,reftext="Multiply, 16-bit encoding"] -==== c.mul - -Synopsis:: -Multiply, 16-bit encoding - -Mnemonic:: -c.mul _rsd'_, _rs2'_ - -Encoding (RV32, RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 2, name: 0x1, attr: ['C1'] }, - { bits: 3, name: 'rs2\'', attr: ['SRC2'] }, - { bits: 2, name: 0x2, attr: ['FUNCT2'] }, - { bits: 3, name: 'rd\'/rs1\'', attr: ['SRCDST'] }, - { bits: 3, name: 0x7 }, - { bits: 3, name: 0x4, attr: ['FUNCT3'] }, -],config:{bits:16}} -.... - -Description:: -This instruction multiplies XLEN bits of the source operands from _rsd'_ and _rs2'_ and writes the lowest XLEN bits of the result to _rsd'_. - -[NOTE] -==== - _rd'/rs1'_ and _rs2'_ are from the standard 8-register set x8-x15. -==== - -Prerequisites:: -M or Zmmul must be configured. - -32-bit equivalent:: -<> - -[NOTE] - - The SAIL module variable for _rd'/rs1'_ is called _rsdc_, and for _rs2'_ is called _rs2c_. - -Operation:: -[source,sail] --- -let result_wide = to_bits(2 * sizeof(xlen), signed(X(rsdc)) * signed(X(rs2c))); -X(rsdc) = result_wide[(sizeof(xlen) - 1) .. 0]; --- - -include::Zcb_footer.adoc[] - diff --git a/src/zc/c_not.adoc b/src/zc/c_not.adoc deleted file mode 100644 index 9a0bbd9..0000000 --- a/src/zc/c_not.adoc +++ /dev/null @@ -1,52 +0,0 @@ -<<< -[#insns-c_not,reftext="Bitwise not, 16-bit encoding"] -==== c.not - -Synopsis:: -Bitwise not, 16-bit encoding - -Mnemonic:: -c.not _rd'/rs1'_ - -Encoding (RV32, RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 2, name: 0x1, attr: ['C1'] }, - { bits: 3, name: 0x5, attr: ['C.NOT'] }, - { bits: 2, name: 0x3, attr: ['FUNCT2'] }, - { bits: 3, name: 'rd\'/rs1\'', attr: ['SRCDST'] }, - { bits: 3, name: 0x7 }, - { bits: 3, name: 0x4, attr: ['FUNCT3'] }, -],config:{bits:16}} -.... - -Description:: -This instruction takes the one's complement of _rd'/rs1'_ and writes the result to the same register. - -[NOTE] -==== - _rd'/rs1'_ is from the standard 8-register set x8-x15. -==== - -Prerequisites:: -None - -32-bit equivalent:: -[source,sail] --- -xori rd'/rs1', rd'/rs1', -1 --- - -[NOTE] - - The SAIL module variable for _rd'/rs1'_ is called _rsdc_. - -Operation:: -[source,sail] --- -X(rsdc) = X(rsdc) XOR -1; --- - -include::Zcb_footer.adoc[] - diff --git a/src/zc/c_sb.adoc b/src/zc/c_sb.adoc deleted file mode 100644 index 395d27a..0000000 --- a/src/zc/c_sb.adoc +++ /dev/null @@ -1,48 +0,0 @@ -<<< -[#insns-c_sb,reftext="Store byte, 16-bit encoding"] -==== c.sb - -Synopsis:: -Store byte, 16-bit encoding - -Mnemonic:: -c.sb _rs2'_, _uimm_(_rs1'_) - -Encoding (RV32, RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 2, name: 0x0, attr: ['C0'] }, - { bits: 3, name: 'rs2\'' }, - { bits: 2, name: 'uimm[0|1]' }, - { bits: 3, name: 'rs1\'' }, - { bits: 3, name: 0x2 }, - { bits: 3, name: 0x4, attr: ['FUNCT3'] }, -],config:{bits:16}} -.... - -include::c_lbsb_imm_offset.adoc[] - -Description:: -This instruction stores the least significant byte of _rs2'_ to the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. - -[NOTE] -==== - _rs1'_ and _rs2'_ are from the standard 8-register set x8-x15. -==== - -Prerequisites:: -None - -32-bit equivalent:: -<> - -Operation:: -[source,sail] --- -//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. - -mem[X(rs1c)+EXTZ(uimm)][7..0] = X(rs2c) --- - -include::Zcb_footer.adoc[] diff --git a/src/zc/c_sext_b.adoc b/src/zc/c_sext_b.adoc deleted file mode 100644 index 2be52d0..0000000 --- a/src/zc/c_sext_b.adoc +++ /dev/null @@ -1,50 +0,0 @@ -<<< -[#insns-c_sext_b,reftext="Sign extend byte, 16-bit encoding"] -==== c.sext.b - -Synopsis:: -Sign extend byte, 16-bit encoding - -Mnemonic:: -c.sext.b _rd'/rs1'_ - -Encoding (RV32, RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 2, name: 0x1, attr: ['C1'] }, - { bits: 3, name: 0x1, attr: ['C.SEXT.B'] }, - { bits: 2, name: 0x3, attr: ['FUNCT2'] }, - { bits: 3, name: 'rd\'/rs1\'', attr: ['SRCDST'] }, - { bits: 3, name: 0x7 }, - { bits: 3, name: 0x4, attr: ['FUNCT3'] }, -],config:{bits:16}} -.... - -Description:: -This instruction takes a single source/destination operand. -It sign-extends the least-significant byte in the operand to XLEN bits by copying the most-significant bit -in the byte (i.e., bit 7) to all of the more-significant bits. - -[NOTE] -==== - _rd'/rs1'_ is from the standard 8-register set x8-x15. -==== - -Prerequisites:: -Zbb is also required. - -32-bit equivalent:: -<> from Zbb - -[NOTE] - - The SAIL module variable for _rd'/rs1'_ is called _rsdc_. - -Operation:: -[source,sail] --- -X(rsdc) = EXTS(X(rsdc)[7..0]); --- - -include::Zcb_footer.adoc[] diff --git a/src/zc/c_sext_h.adoc b/src/zc/c_sext_h.adoc deleted file mode 100644 index 28a8ebe..0000000 --- a/src/zc/c_sext_h.adoc +++ /dev/null @@ -1,51 +0,0 @@ -<<< -[#insns-c_sext_h,reftext="Sign extend halfword, 16-bit encoding"] -==== c.sext.h - -Synopsis:: -Sign extend halfword, 16-bit encoding - -Mnemonic:: -c.sext.h _rd'/rs1'_ - -Encoding (RV32, RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 2, name: 0x1, attr: ['C1'] }, - { bits: 3, name: 0x3, attr: ['C.SEXT.H'] }, - { bits: 2, name: 0x3, attr: ['FUNCT2'] }, - { bits: 3, name: 'rd\'/rs1\'', attr: ['SRCDST'] }, - { bits: 3, name: 0x7 }, - { bits: 3, name: 0x4, attr: ['FUNCT3'] }, -],config:{bits:16}} -.... - -Description:: -This instruction takes a single source/destination operand. -It sign-extends the least-significant halfword in the operand to XLEN bits by copying the most-significant bit -in the halfword (i.e., bit 15) to all of the more-significant bits. - -[NOTE] -==== - _rd'/rs1'_ is from the standard 8-register set x8-x15. -==== - -Prerequisites:: -Zbb is also required. - -32-bit equivalent:: -<> from Zbb - -[NOTE] - - The SAIL module variable for _rd'/rs1'_ is called _rsdc_. - -Operation:: -[source,sail] --- -X(rsdc) = EXTS(X(rsdc)[15..0]); --- - -include::Zcb_footer.adoc[] - diff --git a/src/zc/c_sh.adoc b/src/zc/c_sh.adoc deleted file mode 100644 index 992bed3..0000000 --- a/src/zc/c_sh.adoc +++ /dev/null @@ -1,50 +0,0 @@ -<<< -[#insns-c_sh,reftext="Store halfword, 16-bit encoding"] -==== c.sh - -Synopsis:: -Store halfword, 16-bit encoding - -Mnemonic:: -c.sh _rs2'_, _uimm_(_rs1'_) - -Encoding (RV32, RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 2, name: 0x0, attr: ['C0'] }, - { bits: 3, name: 'rs2\'' }, - { bits: 1, name: 'uimm[1]' }, - { bits: 1, name: '0' }, - { bits: 3, name: 'rs1\'' }, - { bits: 3, name: 0x3 }, - { bits: 3, name: 0x4, attr: ['FUNCT3'] }, -],config:{bits:16}} -.... - -include::c_lhsh_imm_offset.adoc[] - -Description:: -This instruction stores the least significant halfword of _rs2'_ to the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. - -[NOTE] -==== - _rs1'_ and _rs2'_ are from the standard 8-register set x8-x15. -==== - -Prerequisites:: -None - -32-bit equivalent:: -<> - -Operation:: -[source,sail] --- -//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. - -mem[X(rs1c)+EXTZ(uimm)][15..0] = X(rs2c) --- - -include::Zcb_footer.adoc[] - diff --git a/src/zc/c_zca_required.adoc b/src/zc/c_zca_required.adoc deleted file mode 100644 index f7b460c..0000000 Binary files a/src/zc/c_zca_required.adoc and /dev/null differ diff --git a/src/zc/c_zext_b.adoc b/src/zc/c_zext_b.adoc deleted file mode 100644 index c13d39f..0000000 --- a/src/zc/c_zext_b.adoc +++ /dev/null @@ -1,55 +0,0 @@ -<<< -[#insns-c_zext_b,reftext="Zero extend byte, 16-bit encoding"] -==== c.zext.b - -Synopsis:: -Zero extend byte, 16-bit encoding - -Mnemonic:: -c.zext.b _rd'/rs1'_ - -Encoding (RV32, RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 2, name: 0x1, attr: ['C1'] }, - { bits: 3, name: 0x0, attr: ['C.ZEXT.B'] }, - { bits: 2, name: 0x3, attr: ['FUNCT2'] }, - { bits: 3, name: 'rd\'/rs1\'', attr: ['SRCDST'] }, - { bits: 3, name: 0x7 }, - { bits: 3, name: 0x4, attr: ['FUNCT3'] }, -],config:{bits:16}} -.... - -Description:: -This instruction takes a single source/destination operand. -It zero-extends the least-significant byte of the operand to XLEN bits by inserting zeros into all of -the bits more significant than 7. - -[NOTE] -==== - _rd'/rs1'_ is from the standard 8-register set x8-x15. -==== - -Prerequisites:: -None - -32-bit equivalent:: -[source,sail] --- -andi rd'/rs1', rd'/rs1', 0xff --- - -[NOTE] -==== - The SAIL module variable for _rd'/rs1'_ is called _rsdc_. -==== - -Operation:: -[source,sail] --- -X(rsdc) = EXTZ(X(rsdc)[7..0]); --- - -include::Zcb_footer.adoc[] - diff --git a/src/zc/c_zext_h.adoc b/src/zc/c_zext_h.adoc deleted file mode 100644 index 29a31a2..0000000 --- a/src/zc/c_zext_h.adoc +++ /dev/null @@ -1,51 +0,0 @@ -<<< -[#insns-c_zext_h,reftext="Zero extend halfword, 16-bit encoding"] -==== c.zext.h - -Synopsis:: -Zero extend halfword, 16-bit encoding - -Mnemonic:: -c.zext.h _rd'/rs1'_ - -Encoding (RV32, RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 2, name: 0x1, attr: ['C1'] }, - { bits: 3, name: 0x2, attr: ['C.ZEXT.H'] }, - { bits: 2, name: 0x3, attr: ['FUNCT2'] }, - { bits: 3, name: 'rd\'/rs1\'', attr: ['SRCDST'] }, - { bits: 3, name: 0x7 }, - { bits: 3, name: 0x4, attr: ['FUNCT3'] }, -],config:{bits:16}} -.... - -Description:: -This instruction takes a single source/destination operand. -It zero-extends the least-significant halfword of the operand to XLEN bits by inserting zeros into all of -the bits more significant than 15. - -[NOTE] -==== - _rd'/rs1'_ is from the standard 8-register set x8-x15. -==== - -Prerequisites:: -Zbb is also required. - -32-bit equivalent:: -<> from Zbb - -[NOTE] - - The SAIL module variable for _rd'/rs1'_ is called _rsdc_. - -Operation:: -[source,sail] --- -X(rsdc) = EXTZ(X(rsdc)[15..0]); --- - -include::Zcb_footer.adoc[] - diff --git a/src/zc/c_zext_w.adoc b/src/zc/c_zext_w.adoc deleted file mode 100644 index 35684f9..0000000 --- a/src/zc/c_zext_w.adoc +++ /dev/null @@ -1,53 +0,0 @@ -<<< -[#insns-c_zext_w,reftext="Zero extend word, 16-bit encoding"] -==== c.zext.w - -Synopsis:: -Zero extend word, 16-bit encoding - -Mnemonic:: -c.zext.w _rd'/rs1'_ - -Encoding (RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 2, name: 0x1, attr: ['C1'] }, - { bits: 3, name: 0x4, attr: ['C.ZEXT.W'] }, - { bits: 2, name: 0x3, attr: ['FUNCT2'] }, - { bits: 3, name: 'rd\'/rs1\'', attr: ['SRCDST'] }, - { bits: 3, name: 0x7 }, - { bits: 3, name: 0x4, attr: ['FUNCT3'] }, -],config:{bits:16}} -.... - -Description:: -This instruction takes a single source/destination operand. -It zero-extends the least-significant word of the operand to XLEN bits by inserting zeros into all of -the bits more significant than 31. - -[NOTE] -==== - _rd'/rs1'_ is from the standard 8-register set x8-x15. -==== - -Prerequisites:: -Zba is also required. - -32-bit equivalent:: -[source,sail] --- -add.uw rd'/rs1', rd'/rs1', zero --- - -[NOTE] - - The SAIL module variable for _rd'/rs1'_ is called _rsdc_. - -Operation:: -[source,sail] --- -X(rsdc) = EXTZ(X(rsdc)[31..0]); --- - -include::Zcb_footer.adoc[] diff --git a/src/zc/changes_since_v0.50.adoc b/src/zc/changes_since_v0.50.adoc deleted file mode 100644 index b40626c..0000000 --- a/src/zc/changes_since_v0.50.adoc +++ /dev/null @@ -1,130 +0,0 @@ - -There are many changes since v0.50.1, which has been used for toolchain, spike, qemu and the CV32E41P implementation. - -The status of all of the instructions are in the tables. Note that _all_ subsets have been redefined. - -=== Load/store - -.Load/store -[options="header",width=100%] -|==================================================================================== -| v0.50 name | v0.70 Name | Encoding changed? | Semantics changed? | Notes -| C.LB | CM.LB | Y | N | uimm < 4 is "custom defined" -| C.LBU | CM.LBU | Y | N | uimm < 4 is "custom defined" -| C.LH | CM.LH | Y | N | uimm < 4 is "custom defined" -| C.LHU | CM.LHU | Y | N | uimm < 4 is "custom defined" -| C.SB | CM.SB | Y | N | uimm < 4 is "custom defined" -| C.SH | CM.SH | Y | N | uimm < 4 is "custom defined" -| N/A | C.LBU | N/A | N/A | CM.LBU with shorter uimm -| N/A | C.LH | N/A | N/A | CM.LH with shorter uimm -| N/A | C.LHU | N/A | N/A | CM.LHU with shorter uimm -| N/A | C.SB | N/A | N/A | CM.SB with shorter uimm -| N/A | C.SH | N/A | N/A | CM.SH with shorter uimm -|==================================================================================== - -=== Table jump - -.Table Jump -[options="header",width=100%] -|==================================================================================== -| v0.50 name | v0.70 Name | Encoding changed? | Semantics changed? | Notes -| C.TBLJAL | CM.JALT | Y | Y - exception model| Meaning of table index changed in the encoding, # removed from assembly syntax -| C.TBLJ | CM.J | Y | Y - exception model| Meaning of table index changed in the encoding, # removed from assembly syntax -| C.TBLJALM | N/A | N/A | N/A | Deleted -|==================================================================================== - -See this [commit](https://github.com/riscv/riscv-code-size-reduction/commit/8ba5b0fdf05d6fd5af118ba5301910d049abd1a8#diff-8d03bd23cf9ec0eb75984f7c6d4181aa9548acb5898dc9159514e24398076836) for the change in the table jump exception model. - -=== Double move - -.Double move -[options="header",width=100%] -|==================================================================================== -| v0.50 name | v0.70 Name | Encoding changed? | Semantics changed? | Notes -| C.MVA01S07 | CM.MVA01S | Y | N | -| N/A | CM.MVSA01 | N/A | N/A | New instruction -|==================================================================================== - -Note that the .E extension versions for the EABI will be specified in the future, and cannot yet be confirmed as the EABI is not frozen. - -=== Simple instructions - -.Simple instructions -[options="header",width=100%] -|==================================================================================== -| v0.50 name | v0.70 Name | Encoding changed? | Semantics changed? | Notes -| C.ZEXT.B | same | Y | N | -| C.ZEXT.H | same | Y | N | -| C.SEXT.B | same | Y | N | -| C.SEXT.H | same | Y | N | -| C.SEXT.W | same | Y | N | -| C.NOT | same | Y | N | -| C.MUL | same | N | N | unchanged -|==================================================================================== - -=== Push/pop - -All 32-bit forms are removed and all the 16-bit forms support 12 register lists (excluding {ra, s0-s10}): - -. {ra} -. {ra, s0} -. {ra, s0-s1} -. {ra, s0-s2} -. {ra, s0-s3} -. {ra, s0-s4} -. {ra, s0-s5} -. {ra, s0-s6} -. {ra, s0-s7} -. {ra, s0-s8} -. {ra, s0-s9} -. {ra, s0-s11} - -spimm length also updated. - -Note that the .E extension versions for the EABI will be specified in the future, and cannot yet be confirmed as the EABI is not frozen. - -.Push/pop instructions -[options="header",width=100%] -|==================================================================================== -| v0.50 name | v0.70 Name | Encoding changed? | Semantics changed? | Notes -| C.PUSH | CM.PUSH | Y | Y | areg_list no longer supported -| C.POP | CM.POP | Y | Y | -| C.POPRET | CM.POPRET | Y | Y | CM.POPRET doesn't return a value -| C.POPRET | CM.POPRETZ | Y | Y | separate encoding for return zero -|==================================================================================== - -=== Instructions in v0.50 but *not* in v0.70 - -These instructions can be left in the compiler as experimental, enabled with the following switches: - -[#compilerswitches] -.Compiler switches experimental instructions -[options="header",width=100%] -|============================================================================== -| Switch | Enabled instructions -| -mzce-lsgp | LWGP, SWGP, LDGP (RV64), SDGP (RV64) -| -mzce-muli | MULI -| -mzce-beqi | BEQI -| -mzce-bnei | BNEI -| -mzce-cdecbnez | C.DECBNEZ -| -mzce-decbnez | DECBNEZ -|============================================================================== - -==== 16-bit Instructions - -C.DECBNEZ - the encoding space for this has been used by all the CM.* instructions. -Therefore this instruction must be disabled in the compiler - unless an encoding is proposed. - -C.NEG - this is not very useful and can be deleted. - -==== 32-bit Instructions - -MULI - This is in custom-0, so can be kept unchanged. Early benchmarking results suggest it's not much use, and the encoding is expensive so it's unlikely to ever be included in an extension. - -BEQI, BNEI - these fill in the 2 gaps in the BRANCH encoding group - these encodings have not been allocated to other instructions, so these can stay unchanged - -DECBNEZ - this should be updated to match https://github.com/riscv/riscv-code-size-reduction/blob/master/Zce-release-candidate/Zcmd.pdf - -LWGP, SWGP, LDGP, SDGP - these overlap with C.FLD, C.FSD - -PUSH/POP/POPRET - delete all of these diff --git a/src/zc/cm_decbnez.adoc b/src/zc/cm_decbnez.adoc deleted file mode 100644 index 6dbbd77..0000000 --- a/src/zc/cm_decbnez.adoc +++ /dev/null @@ -1,51 +0,0 @@ -<<< -[#insns-cm_decbnez,reftext="Decrement and branch, 16-bit encoding"] -==== cm.decbnez: This is in the _development_ phase, for benchmarking and prototyping only - -Synopsis:: -Decrement and branch, 16-bit encoding - -Mnemonic:: -cm.decbnez _t0_, _offset_ - -Encoding (RV32, RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 2, name: 0x2, attr: ['C2'] }, - { bits: 6, name: 'imm[6|7|3:1|5]', attr: [] }, - { bits: 1, name: 0x1, attr: [] }, - { bits: 3, name: 'imm[4|9:8]', attr: [] }, - { bits: 1, name: 0x1, attr: [] }, - { bits: 3, name: 0x5, attr: ['FUNCT3'] }, -],config:{bits:16}} -.... - -[NOTE] -==== - In the current proposal only t0 can be decremented, future versions may allow more registers -==== - -Description:: -This instruction decrements _t0_, and increments the PC by the sign extended immediate if _t0_ is zero *after* the decrement. - -Prerequisites:: -C or Zca - -32-bit equivalent:: -None - -Operation:: -[source,sail] --- - -//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. - -t0 = 5; -X(t0) = X(t0)-1; -if (X(t0)==0) PC+=sext(imm); else PC+=2; - --- - -include::Zcmd_footer.adoc[] - diff --git a/src/zc/cm_jalt.adoc b/src/zc/cm_jalt.adoc deleted file mode 100644 index 9a5c392..0000000 --- a/src/zc/cm_jalt.adoc +++ /dev/null @@ -1,74 +0,0 @@ -<<< -[#insns-cm_jalt,reftext="Jump and link via table"] -==== cm.jalt - -Synopsis:: -jump via table with optional link - -Mnemonic:: -cm.jalt _index_ - -Encoding (RV32, RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 2, name: 0x2, attr: ['C2'] }, - { bits: 8, name: 'index', attr: [] }, - { bits: 3, name: 0x0, attr: [] }, - { bits: 3, name: 0x5, attr: ['FUNCT3'] }, -],config:{bits:16}} -.... - -[NOTE] -==== - For this encoding to decode as _cm.jalt_, _index>=32_, otherwise it decodes as _cm.jt_, see <>. -==== -[NOTE] - - If JVT.mode = 0 (Jump Table Mode) then _cm.jalt_ behaves as specified here. If JVT.mode is a reserved value, then _cm.jalt_ is also reserved. In the future other defined values of JVT.mode may change the behaviour of _cm.jalt_. - -Assembly Syntax:: - -[source,sail] --- -cm.jalt index --- - -Description:: - -_cm.jalt_ reads an entry from the jump vector table in memory and jumps to the address that was read, linking to _ra_. - -For further information see <>. - -Prerequisites:: -None - -32-bit equivalent:: -No direct equivalent encoding exists. - -<<< - -[#insns-cm_jalt-SAIL,reftext="cm.jalt SAIL code"] -Operation:: - -[source,sail] --- -//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. - -# target_address is temporary internal state, it doesn't represent a real register -# InstMemory is byte indexed - -switch(XLEN) { - 32: table_address[XLEN-1:0] = JVT.base + (index<<2); - 64: table_address[XLEN-1:0] = JVT.base + (index<<3); -} - -//fetch from the jump table -target_address[XLEN-1:0] = InstMemory[table_address][XLEN-1:0]; - -jal ra, target_address[XLEN-1:0]&~0x1; - --- - -include::Zcmt_footer.adoc[] - diff --git a/src/zc/cm_jt.adoc b/src/zc/cm_jt.adoc deleted file mode 100644 index ba7e41c..0000000 --- a/src/zc/cm_jt.adoc +++ /dev/null @@ -1,74 +0,0 @@ -<<< -[#insns-cm_jt,reftext="Jump via table"] -==== cm.jt - -Synopsis:: -jump via table - -Mnemonic:: -cm.jt _index_ - -Encoding (RV32, RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 2, name: 0x2, attr: ['C2'] }, - { bits: 8, name: 'index', attr: [] }, - { bits: 3, name: 0x0, attr: [] }, - { bits: 3, name: 0x5, attr: ['FUNCT3'] }, -],config:{bits:16}} -.... - -[NOTE] -==== - For this encoding to decode as _cm.jt_, _index<32_, otherwise it decodes as _cm.jalt_, see <>. -==== -[NOTE] - - If JVT.mode = 0 (Jump Table Mode) then _cm.jt_ behaves as specified here. If JVT.mode is a reserved value, then _cm.jt_ is also reserved. In the future other defined values of JVT.mode may change the behaviour of _cm.jt_. - -Assembly Syntax:: - -[source,sail] --- -cm.jt index --- - -Description:: - -_cm.jt_ reads an entry from the jump vector table in memory and jumps to the address that was read. - -For further information see <>. - -Prerequisites:: -None - -32-bit equivalent:: -No direct equivalent encoding exists. - -<<< - -[#insns-cm_jt-SAIL,reftext="cm.jt SAIL code"] -Operation:: - -[source,sail] --- -//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. - -# target_address is temporary internal state, it doesn't represent a real register -# InstMemory is byte indexed - -switch(XLEN) { - 32: table_address[XLEN-1:0] = JVT.base + (index<<2); - 64: table_address[XLEN-1:0] = JVT.base + (index<<3); -} - -//fetch from the jump table -target_address[XLEN-1:0] = InstMemory[table_address][XLEN-1:0]; - -j target_address[XLEN-1:0]&~0x1; - --- - -include::Zcmt_footer.adoc[] - diff --git a/src/zc/cm_lb.adoc b/src/zc/cm_lb.adoc deleted file mode 100644 index 4aefffc..0000000 --- a/src/zc/cm_lb.adoc +++ /dev/null @@ -1,49 +0,0 @@ -<<< -[#insns-cm_lb,reftext="Load signed byte, 16-bit encoding"] -==== cm.lb - -Synopsis:: -Load signed byte, 16-bit encoding - -Mnemonic:: -cm.lb _rd'_, _uimm_(_rs1'_) - -Encoding (RV32, RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 2, name: 0x0, attr: ['C0'] }, - { bits: 3, name: 'rd\'' }, - { bits: 2, name: 'uimm[2:1]' }, - { bits: 3, name: 'rs1\'' }, - { bits: 2, name: 'uimm[0|3]' }, - { bits: 1, name: 0x0 }, - { bits: 3, name: 0x1, attr: ['FUNCT3'] }, -],config:{bits:16}} -.... - -include::cm_lbsb_imm_offset.adoc[] - -Description:: -This instruction loads a byte from the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. The resulting byte is sign extended to XLEN bits and is written to _rd'_. - -[NOTE] -==== - _rd'_ and _rs1'_ are from the standard 8-register set x8-x15. -==== - -Prerequisites:: -None - -32-bit equivalent:: -<> - -Operation:: -[source,sail] --- -//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. - -X(rdc) = EXTS(mem[X(rs1c)+EXTZ(uimm)][7..0]); --- - -include::Zcmb_footer.adoc[] diff --git a/src/zc/cm_lbsb_imm_offset.adoc b/src/zc/cm_lbsb_imm_offset.adoc deleted file mode 100644 index 4df7702..0000000 --- a/src/zc/cm_lbsb_imm_offset.adoc +++ /dev/null @@ -1,9 +0,0 @@ - -The immediate offset is formed as follows: -[source,sail] --- - uimm[31:4] = 0; - uimm[3] = encoding[10]; - uimm[2:1] = encoding[6:5]; - uimm[0] = encoding[11]; --- diff --git a/src/zc/cm_lbu.adoc b/src/zc/cm_lbu.adoc deleted file mode 100644 index 601ce3f..0000000 --- a/src/zc/cm_lbu.adoc +++ /dev/null @@ -1,52 +0,0 @@ -<<< -[#insns-cm_lbu,reftext="Load unsigned byte, 16-bit encoding"] -==== cm.lbu - -Synopsis:: -Load unsigned byte, 16-bit encoding - -Mnemonic:: -cm.lbu _rd'_, _uimm_(_rs1'_) - -Encoding (RV32, RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 2, name: 0x2, attr: ['C2'] }, - { bits: 3, name: 'rd\'' }, - { bits: 2, name: 'uimm[2:1]' }, - { bits: 3, name: 'rs1\'' }, - { bits: 2, name: 'uimm[0|3]' }, - { bits: 1, name: 0x0 }, - { bits: 3, name: 0x1, attr: ['FUNCT3'] }, -],config:{bits:16}} -.... - -[NOTE] - If _uimm < 4_ the encoding is designated for custom use, as the functionality overlaps with <>. - -include::cm_lbsb_imm_offset.adoc[] - -Description:: -This instruction loads a byte from the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. The resulting byte is zero extended to XLEN bits and is written to _rd'_. - -[NOTE] -==== - _rd'_ and _rs1'_ are from the standard 8-register set x8-x15. -==== - -Prerequisites:: -None - -32-bit equivalent:: -<> - -Operation:: -[source,sail] --- -//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. - -X(rdc) = EXTZ(mem[X(rs1c)+EXTZ(uimm)][7..0]); --- - -include::Zcmb_footer.adoc[] diff --git a/src/zc/cm_lh.adoc b/src/zc/cm_lh.adoc deleted file mode 100644 index 4a23050..0000000 --- a/src/zc/cm_lh.adoc +++ /dev/null @@ -1,53 +0,0 @@ -<<< -[#insns-cm_lh,reftext="Load signed halfword, 16-bit encoding"] -==== cm.lh - -Synopsis:: -Load signed halfword, 16-bit encoding - -Mnemonic:: -cm.lh _rd'_, _uimm_(_rs1'_) - -Encoding (RV32, RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 2, name: 0x0, attr: ['C0'] }, - { bits: 3, name: 'rd\'' }, - { bits: 2, name: 'uimm[2:1]' }, - { bits: 3, name: 'rs1\'' }, - { bits: 2, name: 'uimm[4:3]' }, - { bits: 1, name: 0x1 }, - { bits: 3, name: 0x1, attr: ['FUNCT3'] }, -],config:{bits:16}} -.... - -[NOTE] - If _uimm < 4_ the encoding is designated for custom use, as the functionality overlaps with <>. - -include::cm_lhsh_imm_offset.adoc[] - -Description:: -This instruction loads a halfword from the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. The resulting halfword is sign extended to XLEN bits and is written to _rd'_. - -[NOTE] -==== - _rd'_ and _rs1'_ are from the standard 8-register set x8-x15. -==== - -Prerequisites:: -None - -32-bit equivalent:: -<> - -Operation:: -[source,sail] --- -//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. - -X(rdc) = EXTS(load_mem[X(rs1c)+EXTZ(uimm)][15..0]); --- - -include::Zcmb_footer.adoc[] - diff --git a/src/zc/cm_lhsh_imm_offset.adoc b/src/zc/cm_lhsh_imm_offset.adoc deleted file mode 100644 index 1aa6bc8..0000000 --- a/src/zc/cm_lhsh_imm_offset.adoc +++ /dev/null @@ -1,9 +0,0 @@ - -The immediate offset is formed as follows: -[source,sail] --- - uimm[31:5] = 0; - uimm[4:3] = encoding[11:10]; - uimm[2:1] = encoding[6:5]; - uimm[0] = 0; --- diff --git a/src/zc/cm_lhu.adoc b/src/zc/cm_lhu.adoc deleted file mode 100644 index 6818ef1..0000000 --- a/src/zc/cm_lhu.adoc +++ /dev/null @@ -1,55 +0,0 @@ -<<< -[#insns-cm_lhu,reftext="Load unsigned halfword, 16-bit encoding"] -==== cm.lhu - -Synopsis:: -Load unsigned halfword, 16-bit encoding - -Mnemonic:: -cm.lhu _rd'_, _uimm_(_rs1'_) - -Encoding (RV32, RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 2, name: 0x2, attr: ['C2'] }, - { bits: 3, name: 'rd\'' }, - { bits: 2, name: 'uimm[2:1]' }, - { bits: 3, name: 'rs1\'' }, - { bits: 2, name: 'uimm[4:3]' }, - { bits: 1, name: 0x1 }, - { bits: 3, name: 0x1, attr: ['FUNCT3'] }, -],config:{bits:16}} -.... - -[NOTE] -==== - If _uimm < 4_ the encoding is designated for custom use, as the functionality overlaps with <>. -==== - -include::cm_lhsh_imm_offset.adoc[] - -Description:: -This instruction loads a halfword from the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. The resulting halfword is zero extended to XLEN bits and is written to _rd'_. - -[NOTE] -==== - _rd'_ and _rs1'_ are from the standard 8-register set x8-x15. -==== - -Prerequisites:: -None - -32-bit equivalent:: -<> - -Operation:: -[source,sail] --- -//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. - -X(rdc) = EXTZ(load_mem[X(rs1c)+EXTZ(uimm)][15..0]); --- - -include::Zcmb_footer.adoc[] - diff --git a/src/zc/cm_mva01s.adoc b/src/zc/cm_mva01s.adoc deleted file mode 100644 index 5b6d009..0000000 --- a/src/zc/cm_mva01s.adoc +++ /dev/null @@ -1,63 +0,0 @@ -<<< -[#insns-cm_mva01s,reftext="Move two s0-s7 registers into a0-a1"] -==== cm.mva01s - -Synopsis:: -Move two s0-s7 registers into a0-a1 - -Mnemonic:: -cm.mva01s _r1s'_, _r2s'_ - -Encoding (RV32, RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 2, name: 0x2, attr: ['C2'] }, - { bits: 3, name: 'r2s\'', attr: [] }, - { bits: 2, name: 0x3, attr: [] }, - { bits: 3, name: 'r1s\'', attr: [] }, - { bits: 3, name: 0x3, attr: [] }, - { bits: 3, name: 0x5, attr: ['FUNCT3'] }, -],config:{bits:16}} -.... - -Assembly Syntax:: - -[source,sail] --- -cm.mva01s r1s', r2s' --- - -Description:: -This instruction moves _r1s'_ into _a0_ and _r2s'_ into _a1_. -The execution is atomic, so it is not possible to observe state where only one of _a0_ or _a1_ have been updated. - -The encoding uses _sreg_ number specifiers instead of _xreg_ number specifiers to save encoding space. -The mapping between them is specified in the pseudo-code below. - -[NOTE] -==== - The _s_ register mapping is taken from the UABI, and may not match the currently unratified EABI. _cm.mva01s.e_ may be included in the future. -==== - -Prerequisites:: -None - -32-bit equivalent:: -No direct equivalent encoding exists. - -Operation:: -[source,sail] --- -//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. -if (RV32E && (r1sc>1 || r2sc>1)) { - reserved(); -} -xreg1 = {r1sc[2:1]>0,r1sc[2:1]==0,r1sc[2:0]}; -xreg2 = {r2sc[2:1]>0,r2sc[2:1]==0,r2sc[2:0]}; -X[10] = X[xreg1]; -X[11] = X[xreg2]; --- - -include::Zcmp_footer.adoc[] - diff --git a/src/zc/cm_mvsa01.adoc b/src/zc/cm_mvsa01.adoc deleted file mode 100644 index 7c4f6e2..0000000 --- a/src/zc/cm_mvsa01.adoc +++ /dev/null @@ -1,68 +0,0 @@ -<<< -[#insns-cm_mvsa01,reftext="Move a0-a1 into two different s0-s7 registers"] -==== cm.mvsa01 - -Synopsis:: -Move a0-a1 into two registers of s0-s7 - -Mnemonic:: -cm.mvsa01 _r1s'_, _r2s'_ - -Encoding (RV32, RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 2, name: 0x2, attr: ['C2'] }, - { bits: 3, name: 'r2s\'', attr: [] }, - { bits: 2, name: 0x1, attr: [] }, - { bits: 3, name: 'r1s\'', attr: [] }, - { bits: 3, name: 0x3, attr: [] }, - { bits: 3, name: 0x5, attr: ['FUNCT3'] }, -],config:{bits:16}} -.... - -[NOTE] -==== - For the encoding to be legal _r1s'_ != _r2s'_. -==== - -Assembly Syntax:: - -[source,sail] --- -cm.mvsa01 r1s', r2s' --- - -Description:: -This instruction moves _a0_ into _r1s'_ and _a1_ into _r2s'_. _r1s'_ and _r2s'_ must be different. -The execution is atomic, so it is not possible to observe state where only one of _r1s'_ or _r2s'_ has been updated. - -The encoding uses _sreg_ number specifiers instead of _xreg_ number specifiers to save encoding space. -The mapping between them is specified in the pseudo-code below. - -[NOTE] -==== - The _s_ register mapping is taken from the UABI, and may not match the currently unratified EABI. _cm.mvsa01.e_ may be included in the future. -==== - -Prerequisites:: -None - -32-bit equivalent:: -No direct equivalent encoding exists. - -Operation:: -[source,sail] --- -//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. -if (RV32E && (r1sc>1 || r2sc>1)) { - reserved(); -} -xreg1 = {r1sc[2:1]>0,r1sc[2:1]==0,r1sc[2:0]}; -xreg2 = {r2sc[2:1]>0,r2sc[2:1]==0,r2sc[2:0]}; -X[xreg1] = X[10]; -X[xreg2] = X[11]; --- - -include::Zcmp_footer.adoc[] - diff --git a/src/zc/cm_pop.adoc b/src/zc/cm_pop.adoc deleted file mode 100644 index fb9c880..0000000 --- a/src/zc/cm_pop.adoc +++ /dev/null @@ -1,49 +0,0 @@ -<<< -[#insns-cm_pop,reftext="Pop registers, deallocate stack frame."] -==== cm.pop - -Synopsis:: -Destroy stack frame: load ra and 0 to 12 saved registers from the stack frame, deallocate the stack frame. - -Mnemonic:: -cm.pop _{reg_list}, stack_adj_ - -Encoding (RV32, RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 2, name: 0x2, attr: ['C2'] }, - { bits: 2, name: 'spimm\[5:4\]', attr: [] }, - { bits: 4, name: 'rlist', attr: [] }, - { bits: 5, name: 0x1a, attr: [] }, - { bits: 3, name: 0x5, attr: ['FUNCT3'] }, -],config:{bits:16}} -.... - -[NOTE] -==== - _rlist_ values 0 to 3 are reserved for a future EABI variant called _cm.pop.e_ -==== -Assembly Syntax:: - -[source,sail] --- -cm.pop {reg_list}, stack_adj -cm.pop {xreg_list}, stack_adj --- - -include::variable_def.adoc[] -include::pushpop_vars.adoc[] - -<<< - -Description:: -This instruction pops (loads) the registers in _reg_list_ from stack memory, -and then adjusts the stack pointer by _stack_adj_. - -include::pushpop_extra_info.adoc[] -include::cm_pop_popret_loads_pseudo_code.adoc[] -include::cm_pop_pseudo_code.adoc[] - -include::Zcmp_footer.adoc[] - diff --git a/src/zc/cm_pop_popret_loads_pseudo_code.adoc b/src/zc/cm_pop_popret_loads_pseudo_code.adoc deleted file mode 100644 index af46b9d..0000000 --- a/src/zc/cm_pop_popret_loads_pseudo_code.adoc +++ /dev/null @@ -1,25 +0,0 @@ - -Operation:: - -The first section of pseudo-code may be executed multiple times before the instruction successfully completes. - -[source,sail] --- -//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. - -if (XLEN==32) bytes=4; else bytes=8; - -addr=sp+stack_adj-bytes; -for(i in 27,26,25,24,23,22,21,20,19,18,9,8,1) { - //if register i is in xreg_list - if (xreg_list[i]) { - switch(bytes) { - 4: asm("lw x[i], 0(addr)"); - 8: asm("ld x[i], 0(addr)"); - } - addr-=bytes; - } -} --- - -The final section of pseudo-code executes atomically, and only executes if the section above completes without any exceptions or interrupts. diff --git a/src/zc/cm_pop_pseudo_code.adoc b/src/zc/cm_pop_pseudo_code.adoc deleted file mode 100644 index 0cd38a0..0000000 --- a/src/zc/cm_pop_pseudo_code.adoc +++ /dev/null @@ -1,7 +0,0 @@ - -[source,sail] --- -//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. - -sp+=stack_adj; --- diff --git a/src/zc/cm_popret.adoc b/src/zc/cm_popret.adoc deleted file mode 100644 index 7650e6a..0000000 --- a/src/zc/cm_popret.adoc +++ /dev/null @@ -1,50 +0,0 @@ -<<< -[#insns-cm_popret,reftext="Pop registers, deallocate stack frame, return."] -==== cm.popret - -Synopsis:: -Destroy stack frame: load ra and 0 to 12 saved registers from the stack frame, deallocate the stack frame, return to ra. - -Mnemonic:: -cm.popret _{reg_list}, stack_adj_ - -Encoding (RV32, RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 2, name: 0x2, attr: ['C2'] }, - { bits: 2, name: 'spimm\[5:4\]', attr: [] }, - { bits: 4, name: 'rlist', attr: [] }, - { bits: 5, name: 0x1e, attr: [] }, - { bits: 3, name: 0x5, attr: ['FUNCT3'] }, -],config:{bits:16}} -.... - -[NOTE] -==== - _rlist_ values 0 to 3 are reserved for a future EABI variant called _cm.popret.e_ -==== - -Assembly Syntax:: - -[source,sail] --- -cm.popret {reg_list}, stack_adj -cm.popret {xreg_list}, stack_adj --- - -include::variable_def.adoc[] -include::pushpop_vars.adoc[] - -<<< - -Description:: -This instruction pops (loads) the registers in _reg_list_ from stack memory, - adjusts the stack pointer by _stack_adj_ and then returns to _ra_. - -include::pushpop_extra_info.adoc[] -include::cm_pop_popret_loads_pseudo_code.adoc[] -include::cm_popret_pseudo_code.adoc[] - -include::Zcmp_footer.adoc[] - diff --git a/src/zc/cm_popret_pseudo_code.adoc b/src/zc/cm_popret_pseudo_code.adoc deleted file mode 100644 index ecf60f2..0000000 --- a/src/zc/cm_popret_pseudo_code.adoc +++ /dev/null @@ -1,9 +0,0 @@ - - -[source,sail] --- -//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. - -sp+=stack_adj; -asm("ret"); --- diff --git a/src/zc/cm_popretz.adoc b/src/zc/cm_popretz.adoc deleted file mode 100644 index d7e3bb8..0000000 --- a/src/zc/cm_popretz.adoc +++ /dev/null @@ -1,49 +0,0 @@ -<<< -[#insns-cm_popretz,reftext="Pop registers, deallocate stack frame, return zero."] -==== cm.popretz - -Synopsis:: -Destroy stack frame: load ra and 0 to 12 saved registers from the stack frame, deallocate the stack frame, move zero into a0, return to ra. - -Mnemonic:: -cm.popretz _{reg_list}, stack_adj_ - -Encoding (RV32, RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 2, name: 0x2, attr: ['C2'] }, - { bits: 2, name: 'spimm\[5:4\]', attr: [] }, - { bits: 4, name: 'rlist', attr: [] }, - { bits: 5, name: 0x1c, attr: [] }, - { bits: 3, name: 0x5, attr: ['FUNCT3'] }, -],config:{bits:16}} -.... - -[NOTE] -==== - _rlist_ values 0 to 3 are reserved for a future EABI variant called _cm.popretz.e_ -==== - -Assembly Syntax:: - -[source,sail] --- -cm.popretz {reg_list}, stack_adj -cm.popretz {xreg_list}, stack_adj --- - -include::pushpop_vars.adoc[] - -<<< - -Description:: -This instruction pops (loads) the registers in _reg_list_ from stack memory, - adjusts the stack pointer by _stack_adj_, moves zero into a0 and then returns to _ra_. - -include::pushpop_extra_info.adoc[] -include::cm_pop_popret_loads_pseudo_code.adoc[] -include::cm_popretz_pseudo_code.adoc[] - -include::Zcmp_footer.adoc[] - diff --git a/src/zc/cm_popretz_pseudo_code.adoc b/src/zc/cm_popretz_pseudo_code.adoc deleted file mode 100644 index 6aac95c..0000000 --- a/src/zc/cm_popretz_pseudo_code.adoc +++ /dev/null @@ -1,14 +0,0 @@ - - -[NOTE] - - The _li a0, 0_ *could* be executed more than once, but is included in the atomic section for convenience. - -[source,sail] --- -//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. - -asm("li a0, 0"); -sp+=stack_adj; -asm("ret"); --- diff --git a/src/zc/cm_push.adoc b/src/zc/cm_push.adoc deleted file mode 100644 index 13b93fe..0000000 --- a/src/zc/cm_push.adoc +++ /dev/null @@ -1,49 +0,0 @@ -<<< -[#insns-cm_push,reftext="Create stack frame: push registers, allocate additional stack space."] -==== cm.push - -Synopsis:: -Create stack frame: store ra and 0 to 12 saved registers to the stack frame, optionally allocate additional stack space. - -Mnemonic:: -cm.push _{reg_list}, -stack_adj_ - -Encoding (RV32, RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 2, name: 0x2, attr: ['C2'] }, - { bits: 2, name: 'spimm\[5:4\]', attr: [] }, - { bits: 4, name: 'rlist', attr: [] }, - { bits: 5, name: 0x18, attr: [] }, - { bits: 3, name: 0x5, attr: ['FUNCT3'] }, -],config:{bits:16}} -.... - -[NOTE] -==== - _rlist_ values 0 to 3 are reserved for a future EABI variant called _cm.push.e_ -==== - -Assembly Syntax:: - -[source,sail] --- -cm.push {reg_list}, -stack_adj -cm.push {xreg_list}, -stack_adj --- - -include::variable_def.adoc[] -include::pushpop_vars.adoc[] - -<<< -Description:: -This instruction pushes (stores) the registers in _reg_list_ to the memory below the stack pointer, -and then creates the stack frame by decrementing the stack pointer by _stack_adj_, -including any additional stack space requested by the value of _spimm_. - -include::pushpop_extra_info.adoc[] -include::cm_push_stores_pseudo_code.adoc[] -include::cm_push_pseudo_code.adoc[] - -include::Zcmp_footer.adoc[] diff --git a/src/zc/cm_push_pseudo_code.adoc b/src/zc/cm_push_pseudo_code.adoc deleted file mode 100644 index 8500f0e..0000000 --- a/src/zc/cm_push_pseudo_code.adoc +++ /dev/null @@ -1,7 +0,0 @@ - -[source,sail] --- -//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. - -sp-=stack_adj; --- diff --git a/src/zc/cm_push_stores_pseudo_code.adoc b/src/zc/cm_push_stores_pseudo_code.adoc deleted file mode 100644 index 46771dd..0000000 --- a/src/zc/cm_push_stores_pseudo_code.adoc +++ /dev/null @@ -1,25 +0,0 @@ - -Operation:: - -The first section of pseudo-code may be executed multiple times before the instruction successfully completes. - -[source,sail] --- -//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. - -if (XLEN==32) bytes=4; else bytes=8; - -addr=sp-bytes; -for(i in 27,26,25,24,23,22,21,20,19,18,9,8,1) { - //if register i is in xreg_list - if (xreg_list[i]) { - switch(bytes) { - 4: asm("sw x[i], 0(addr)"); - 8: asm("sd x[i], 0(addr)"); - } - addr-=bytes; - } -} --- - -The final section of pseudo-code executes atomically, and only executes if the section above completes without any exceptions or interrupts. diff --git a/src/zc/cm_sb.adoc b/src/zc/cm_sb.adoc deleted file mode 100644 index b3e45ba..0000000 --- a/src/zc/cm_sb.adoc +++ /dev/null @@ -1,54 +0,0 @@ -<<< -[#insns-cm_sb,reftext="Store byte, 16-bit encoding"] -==== cm.sb - -Synopsis:: -Store byte, 16-bit encoding - -Mnemonic:: -cm.sb _rs2'_, _uimm_(_rs1'_) - -Encoding (RV32, RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 2, name: 0x0, attr: ['C0'] }, - { bits: 3, name: 'rs2\'' }, - { bits: 2, name: 'uimm[2:1]' }, - { bits: 3, name: 'rs1\'' }, - { bits: 2, name: 'uimm[0|3]' }, - { bits: 1, name: 0x0 }, - { bits: 3, name: 0x5, attr: ['FUNCT3'] }, -],config:{bits:16}} -.... - -[NOTE] -==== - If _uimm < 4_ the encoding is designated for custom use, as the functionality overlaps with <>. -==== - -include::cm_lbsb_imm_offset.adoc[] - -Description:: -This instruction stores the least significant byte of _rs2'_ to the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. - -[NOTE] -==== - _rs1'_ and _rs2'_ are from the standard 8-register set x8-x15. -==== - -Prerequisites:: -None - -32-bit equivalent:: -<> - -Operation:: -[source,sail] --- -//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. - -mem[X(rs1c)+EXTZ(uimm)][7..0] = X(rs2c) --- - -include::Zcmb_footer.adoc[] diff --git a/src/zc/cm_sh.adoc b/src/zc/cm_sh.adoc deleted file mode 100644 index 2464114..0000000 --- a/src/zc/cm_sh.adoc +++ /dev/null @@ -1,55 +0,0 @@ -<<< -[#insns-cm_sh,reftext="Store halfword, 16-bit encoding"] -==== cm.sh - -Synopsis:: -Store halfword, 16-bit encoding - -Mnemonic:: -cm.sh _rs2'_, _uimm_(_rs1'_) - -Encoding (RV32, RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 2, name: 0x0, attr: ['C0'] }, - { bits: 3, name: 'rs2\'' }, - { bits: 2, name: 'uimm[2:1]' }, - { bits: 3, name: 'rs1\'' }, - { bits: 2, name: 'uimm[4:3]' }, - { bits: 1, name: 0x1 }, - { bits: 3, name: 0x5, attr: ['FUNCT3'] }, -],config:{bits:16}} -.... - -[NOTE] -==== - If _uimm < 4_ the encoding is designated for custom use, as the functionality overlaps with <>. -==== - -include::cm_lhsh_imm_offset.adoc[] - -Description:: -This instruction stores the least significant halfword of _rs2'_ to the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. - -[NOTE] -==== - _rs1'_ and _rs2'_ are from the standard 8-register set x8-x15. -==== - -Prerequisites:: -None - -32-bit equivalent:: -<> - -Operation:: -[source,sail] --- -//This is not SAIL, it's pseudo-code. The SAIL hasn't been written yet. - -mem[X(rs1c)+EXTZ(uimm)][15..0] = X(rs2c) --- - -include::Zcmb_footer.adoc[] - diff --git a/src/zc/example.bib b/src/zc/example.bib deleted file mode 100644 index dd4ca0b..0000000 --- a/src/zc/example.bib +++ /dev/null @@ -1,40 +0,0 @@ -@inproceedings{riscI-isca1981, - title = {{RISC I}: {A} Reduced Instruction Set {VLSI} Computer}, - author = {David A. Patterson and Carlo H. S\'{e}quin}, - booktitle = {ISCA}, - location = {Minneapolis, Minnesota, USA}, - pages = {443-458}, - year = {1981} -} - -@InProceedings{Katevenis:1983, - author = {Katevenis, Manolis G.H. and Sherburne,Jr., Robert W. and Patterson, David A. and S{\'e}quin, Carlo H.}, - title = {The {RISC II} micro-architecture}, - booktitle = {Proceedings VLSI 83 Conference}, - year = 1983, - month = {August}} - -@inproceedings{Ungar:1984, - author = {David Ungar and Ricki Blau and Peter Foley and Dain Samples - and David Patterson}, - title = {Architecture of {SOAR}: {Smalltalk} on a {RISC}}, - booktitle = {ISCA}, - address = {Ann Arbor, MI}, - year = {1984}, - pages = {188--197} -} - -@Article{spur-jsscc1989, - author = {David D. Lee and Shing I. Kong and Mark D. Hill and - George S. Taylor and David A. Hodges and Randy - H. Katz and David A. Patterson}, - title = {A {VLSI} Chip Set for a Multiprocessor - Workstation--{Part I}: An {RISC} Microprocessor with - Coprocessor Interface and Support for Symbolic - Processing}, - journal = {IEEE JSSC}, - year = 1989, - volume = 24, - number = 6, - pages = {1688--1698}, - month = {December}} diff --git a/src/zc/jvt_csr.adoc b/src/zc/jvt_csr.adoc deleted file mode 100644 index 5484db3..0000000 --- a/src/zc/jvt_csr.adoc +++ /dev/null @@ -1,68 +0,0 @@ -<<< -[#csrs-jvt,reftext="JVT CSR, table jump base vector and control register"] -==== JVT CSR - -Synopsis:: -Table jump base vector and control register - -Address:: -0x0017 - -Permissions:: -URW - -Format (RV32):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 6, name: 'mode', attr: ['6'] }, - { bits: 26, name: 'base[XLEN-1:6] (WARL)', attr: ['XLEN-6'] }, -],config:{bits:32}} -.... - -Format (RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 6, name: 'mode', attr: ['6'] }, - { bits: 58, name: 'base[XLEN-1:6] (WARL)', attr: ['XLEN-6'] }, -],config:{bits:64}} -.... - -Description:: - -The _JVT_ register is an XLEN-bit *WARL* read/write register that holds the jump table configuration, consisting of the jump table base address (BASE) and the jump table mode (MODE). - -If <> is implemented then _JVT_ must also be implemented, but can contain a read-only value. If _JVT_ is writable, the set of values the register may hold can vary by implementation. The value in the BASE field must always be aligned on a 64-byte boundary. - -_JVT.base_ is a virtual address, whenever virtual memory is enabled. - -The memory pointed to by _JVT.base_ is treated as instruction memory for the purpose of executing table jump instructions, implying execute access permission. - -[#JVT-config-table] -._JVT.mode_ definition -[width="60%",options=header] -|============================================================================================= -| JVT.mode | Comment -| 000000 | Jump table mode -| others | *reserved for future standard use* -|============================================================================================= - -_JVT.mode_ is a *WARL* field, so can only be programmed to modes which are implemented. Therefore the discovery mechanism is to -attempt to program different modes and read back the values to see which are available. Jump table mode _must_ be implemented. - -[NOTE] -==== - in future the RISC-V Unified Discovery method will report the available modes. -==== - -Architectural State:: - -_JVT_ adds architectural state to the system software context (such as an OS process), therefore must be saved/restored on context switches. - -State Enable:: - -If the Smstateen extension is implemented, then bit 2 in _mstateen0_, _sstateen0_, and _hstateen0_ is implemented. If bit 2 of a controlling _stateen0_ CSR is zero, then access to the _JVT_ CSR and execution of a _cm.jalt_ or _cm.jt_ instruction by a lower privilege level results in an Illegal Instruction trap (or, if appropriate, a Virtual Instruction trap). - -include::Zcmt_footer.adoc[] - diff --git a/src/zc/pushpop.adoc b/src/zc/pushpop.adoc deleted file mode 100644 index 8c706cf..0000000 --- a/src/zc/pushpop.adoc +++ /dev/null @@ -1,354 +0,0 @@ -<<< - -[#insns-pushpop,reftext="PUSH/POP Register Instructions"] -=== PUSH/POP register instructions - -These instructions are collectively referred to as PUSH/POP: - -* <<#insns-cm_push>> -* <<#insns-cm_pop>> -* <<#insns-cm_popret>> -* <<#insns-cm_popretz>> - -The term PUSH refers to _cm.push_. - -The term POP refers to _cm.pop_. - -The term POPRET refers to _cm.popret and cm.popretz_. - -Common details for these instructions are in this section. - -==== PUSH/POP functional overview - -PUSH, POP, POPRET are used to reduce the size of function prologues and epilogues. - -. The PUSH instruction -** adjusts the stack pointer to create the stack frame -** pushes (stores) the registers specified in the register list to the stack frame - -. The POP instruction -** pops (loads) the registers in the register list from the stack frame -** adjusts the stack pointer to destroy the stack frame - -. The POPRET instructions -** pop (load) the registers in the register list from the stack frame -** _cm.popretz_ also moves zero into _a0_ as the return value -** adjust the stack pointer to destroy the stack frame -** execute a _ret_ instruction to return from the function - -<<< -==== Example usage - -This example gives an illustration of the use of PUSH and POPRET. - -The function _processMarkers_ in the EMBench benchmark picojpeg in the following file on github: https://github.com/embench/embench-iot/blob/master/src/picojpeg/libpicojpeg.c[libpicojpeg.c] - -The prologue and epilogue compile with GCC10 to: - -[source,SAIL] ----- - - 0001098a : - 1098a: 711d addi sp,sp,-96 ;#cm.push(1) - 1098c: c8ca sw s2,80(sp) ;#cm.push(2) - 1098e: c6ce sw s3,76(sp) ;#cm.push(3) - 10990: c4d2 sw s4,72(sp) ;#cm.push(4) - 10992: ce86 sw ra,92(sp) ;#cm.push(5) - 10994: cca2 sw s0,88(sp) ;#cm.push(6) - 10996: caa6 sw s1,84(sp) ;#cm.push(7) - 10998: c2d6 sw s5,68(sp) ;#cm.push(8) - 1099a: c0da sw s6,64(sp) ;#cm.push(9) - 1099c: de5e sw s7,60(sp) ;#cm.push(10) - 1099e: dc62 sw s8,56(sp) ;#cm.push(11) - 109a0: da66 sw s9,52(sp) ;#cm.push(12) - 109a2: d86a sw s10,48(sp);#cm.push(13) - 109a4: d66e sw s11,44(sp);#cm.push(14) -... - 109f4: 4501 li a0,0 ;#cm.popretz(1) - 109f6: 40f6 lw ra,92(sp) ;#cm.popretz(2) - 109f8: 4466 lw s0,88(sp) ;#cm.popretz(3) - 109fa: 44d6 lw s1,84(sp) ;#cm.popretz(4) - 109fc: 4946 lw s2,80(sp) ;#cm.popretz(5) - 109fe: 49b6 lw s3,76(sp) ;#cm.popretz(6) - 10a00: 4a26 lw s4,72(sp) ;#cm.popretz(7) - 10a02: 4a96 lw s5,68(sp) ;#cm.popretz(8) - 10a04: 4b06 lw s6,64(sp) ;#cm.popretz(9) - 10a06: 5bf2 lw s7,60(sp) ;#cm.popretz(10) - 10a08: 5c62 lw s8,56(sp) ;#cm.popretz(11) - 10a0a: 5cd2 lw s9,52(sp) ;#cm.popretz(12) - 10a0c: 5d42 lw s10,48(sp);#cm.popretz(13) - 10a0e: 5db2 lw s11,44(sp);#cm.popretz(14) - 10a10: 6125 addi sp,sp,96 ;#cm.popretz(15) - 10a12: 8082 ret ;#cm.popretz(16) ----- - -<<< - -with the GCC option _-msave-restore_ the output is the following: - -[source,SAIL] ----- -0001080e : - 1080e: 73a012ef jal t0,11f48 <__riscv_save_12> - 10812: 1101 addi sp,sp,-32 -... - 10862: 4501 li a0,0 - 10864: 6105 addi sp,sp,32 - 10866: 71e0106f j 11f84 <__riscv_restore_12> ----- - -with PUSH/POPRET this reduces to - -[source,SAIL] ----- -0001080e : - 1080e: b8fa cm.push {ra,s0-s11},-96 -... - 10866: bcfa cm.popretz {ra,s0-s11}, 96 ----- - -The prologue / epilogue reduce from 60-bytes in the original code, to 14-bytes with _-msave-restore_, -and to 4-bytes with PUSH and POPRET. -As well as reducing the code-size PUSH and POPRET eliminate the branches from -calling the millicode _save/restore_ routines and so may also perform better. - -[NOTE] -==== - The calls to _/_ become 64-bit when the target functions are out of the ±1MB range, increasing the prologue/epilogue size to 22-bytes. -==== - -[NOTE] -==== - POP is typically used in tail-calling sequences where _ret_ is not used to return to _ra_ after destroying the stack frame. -==== - -[#pushpop-areg-list] - -===== Stack pointer adjustment handling - -The instructions all automatically adjust the stack pointer by enough to cover the memory required for the registers being saved or restored. -Additionally the _spimm_ field in the encoding allows the stack pointer to be adjusted in additional increments of 16-bytes. There is only a small restricted -range available in the encoding; if the range is insufficient then a separate _c.addi16sp_ can be used to increase the range. - -===== Register list handling - -There is no support for the _{ra, s0-s10}_ register list without also adding _s11_. Therefore the _{ra, s0-s11}_ register list must be used in this case. - -[#pushpop-idempotent-memory] -==== PUSH/POP Fault handling - -Correct execution requires that _sp_ refers to idempotent memory (also see <>), because the core must be able to -handle traps detected during the sequence. -The entire PUSH/POP sequence is re-executed after returning from the trap handler, and multiple traps are possible during the sequence. - -If a trap occurs during the sequence then _xEPC_ is updated with the PC of the instruction, _xTVAL_ (if not read-only-zero) updated with the bad address if it was an access fault and _xCAUSE_ updated with the type of trap. - -NOTE: It is implementation defined whether interrupts can also be taken during the sequence execution. - -[#pushpop-software-view] -==== Software view of execution - -===== Software view of the PUSH sequence - -From a software perspective the PUSH sequence appears as: - -* A sequence of stores writing the bytes required by the pseudo-code -** The bytes may be written in any order. -** The bytes may be grouped into larger accesses. -** Any of the bytes may be written multiple times. -* A stack pointer adjustment - -[NOTE] -==== - If an implementation allows interrupts during the sequence, and the interrupt handler uses _sp_ to allocate stack memory, then any stores which were executed before the interrupt may be overwritten by the handler. This is safe because the memory is idempotent and the stores will be re-executed when execution resumes. -==== - -The stack pointer adjustment must only be committed only when it is certain that the entire PUSH instruction will commit. - -Stores may also return imprecise faults from the bus. -It is platform defined whether the core implementation waits for the bus responses before continuing to the final stage of the sequence, -or handles errors responses after completing the PUSH instruction. - -<<< - -For example: - -[source,sail] --- -cm.push {ra, s0-s5}, -64 --- - -Appears to software as: - -[source,sail] --- -# any bytes from sp-1 to sp-28 may be written multiple times before -# the instruction completes therefore these updates may be visible in -# the interrupt/exception handler below the stack pointer -sw s5, -4(sp) -sw s4, -8(sp) -sw s3,-12(sp) -sw s2,-16(sp) -sw s1,-20(sp) -sw s0,-24(sp) -sw ra,-28(sp) - -# this must only execute once, and will only execute after all stores -# completed without any precise faults, therefore this update is only -# visible in the interrupt/exception handler if cm.push has completed -addi sp, sp, -64 --- - -===== Software view of the POP/POPRET sequence - -From a software perspective the POP/POPRET sequence appears as: - -* A sequence of loads reading the bytes required by the pseudo-code. -** The bytes may be loaded in any order. -** The bytes may be grouped into larger accesses. -** Any of the bytes may be loaded multiple times. -* A stack pointer adjustment -* An optional `li a0, 0` -* An optional `ret` - -If a trap occurs during the sequence, then any loads which were executed before the trap may update architectural state. -The loads will be re-executed once the trap handler completes, so the values will be overwritten. -Therefore it is permitted for an implementation to update some of the destination registers before taking a fault. - -The optional `li a0, 0`, stack pointer adjustment and optional `ret` must only be committed only when it is certain that the entire POP/POPRET instruction will commit. - -For POPRET once the stack pointer adjustment has been committed the `ret` must execute. - -<<< -For example: - -[source,sail] --- -cm.popretz {ra, s0-s3}, 32; --- - -Appears to software as: - -[source,sail] --- -# any or all of these load instructions may execute multiple times -# therefore these updates may be visible in the interrupt/exception handler -lw s3, 28(sp) -lw s2, 24(sp) -lw s1, 20(sp) -lw s0, 16(sp) -lw ra, 12(sp) - -# these must only execute once, will only execute after all loads -# complete successfully all instructions must execute atomically -# therefore these updates are not visible in the interrupt/exception handler -li a0, 0 -addi sp, sp, 32 -ret --- - -[[pushpop_non-idem-mem]] -==== Non-idempotent memory handling - -An implementation may have a requirement to issue a PUSH/POP instruction to non-idempotent memory. - -If the core implementation does not support PUSH/POP to non-idempotent memories, the core may use an idempotency PMA to detect it and take a -load (POP/POPRET) or store (PUSH) access fault exception in order to avoid unpredictable results. - -Software should only use these instructions on non-idempotent memory regions when software can tolerate the required memory accesses -being issued repeatedly in the case that they cause exceptions. - -<<< - -==== Example RV32I PUSH/POP sequences - -The examples are included show the load/store series expansion and the stack adjustment. -Examples of _cm.popret_ and _cm.popretz_ are not included, as the difference in the expanded sequence from _cm.pop_ is trivial in all cases. - -===== cm.push {ra, s0-s2}, -64 - -Encoding: _rlist_=7, _spimm_=3 - -expands to: - -[source,sail] --- -sw s2, -4(sp); -sw s1, -8(sp); -sw s0, -12(sp); -sw ra, -16(sp); -addi sp, sp, -64; --- - -===== cm.push {ra, s0-s11}, -112 - -Encoding: _rlist_=15, _spimm_=3 - -expands to: - -[source,sail] --- -sw s11, -4(sp); -sw s10, -8(sp); -sw s9, -12(sp); -sw s8, -16(sp); -sw s7, -20(sp); -sw s6, -24(sp); -sw s5, -28(sp); -sw s4, -32(sp); -sw s3, -36(sp); -sw s2, -40(sp); -sw s1, -44(sp); -sw s0, -48(sp); -sw ra, -52(sp); -addi sp, sp, -112; --- - -<<< - -===== cm.pop {ra}, 16 - -Encoding: _rlist_=4, _spimm_=0 - -expands to: - -[source,sail] --- -lw ra, 12(sp); -addi sp, sp, 16; --- - -===== cm.pop {ra, s0-s3}, 48 - -Encoding: _rlist_=8, _spimm_=1 - -expands to: - -[source,sail] --- -lw s3, 44(sp); -lw s2, 40(sp); -lw s1, 36(sp); -lw s0, 32(sp); -lw ra, 28(sp); -addi sp, sp, 48; --- - -===== cm.pop {ra, s0-s4}, 64 - -Encoding: _rlist_=9, _spimm_=2 - -expands to: - -[source,sail] --- -lw s4, 60(sp); -lw s3, 56(sp); -lw s2, 52(sp); -lw s1, 48(sp); -lw s0, 44(sp); -lw ra, 40(sp); -addi sp, sp, 64; --- - -include::Zcmp_footer.adoc[] diff --git a/src/zc/pushpop_extra_info.adoc b/src/zc/pushpop_extra_info.adoc deleted file mode 100644 index 342e36d..0000000 --- a/src/zc/pushpop_extra_info.adoc +++ /dev/null @@ -1,23 +0,0 @@ - -[NOTE] -==== - All ABI register mappings are for the UABI. An EABI version is planned once the EABI is frozen. -==== - -For further information see <>. - -Stack Adjustment Calculation:: - -_stack_adj_base_ is the minimum number of bytes, in multiples of 16-byte address increments, required to cover the registers in the list. - -_spimm_ is the number of additional 16-byte address increments allocated for the stack frame. - -The total stack adjustment represents the total size of the stack frame, which is _stack_adj_base_ added to _spimm_ scaled by 16, -as defined above. - -Prerequisites:: -None - -32-bit equivalent:: -No direct equivalent encoding exists - diff --git a/src/zc/pushpop_vars.adoc b/src/zc/pushpop_vars.adoc deleted file mode 100644 index ce25524..0000000 --- a/src/zc/pushpop_vars.adoc +++ /dev/null @@ -1,91 +0,0 @@ - -[source,sail] --- -RV32E: - -switch (rlist){ - case 4: {reg_list="ra"; xreg_list="x1";} - case 5: {reg_list="ra, s0"; xreg_list="x1, x8";} - case 6: {reg_list="ra, s0-s1"; xreg_list="x1, x8-x9";} - default: reserved(); -} -stack_adj = stack_adj_base + spimm[5:4] * 16; --- - -[source,sail] --- -RV32I, RV64: - -switch (rlist){ - case 4: {reg_list="ra"; xreg_list="x1";} - case 5: {reg_list="ra, s0"; xreg_list="x1, x8";} - case 6: {reg_list="ra, s0-s1"; xreg_list="x1, x8-x9";} - case 7: {reg_list="ra, s0-s2"; xreg_list="x1, x8-x9, x18";} - case 8: {reg_list="ra, s0-s3"; xreg_list="x1, x8-x9, x18-x19";} - case 9: {reg_list="ra, s0-s4"; xreg_list="x1, x8-x9, x18-x20";} - case 10: {reg_list="ra, s0-s5"; xreg_list="x1, x8-x9, x18-x21";} - case 11: {reg_list="ra, s0-s6"; xreg_list="x1, x8-x9, x18-x22";} - case 12: {reg_list="ra, s0-s7"; xreg_list="x1, x8-x9, x18-x23";} - case 13: {reg_list="ra, s0-s8"; xreg_list="x1, x8-x9, x18-x24";} - case 14: {reg_list="ra, s0-s9"; xreg_list="x1, x8-x9, x18-x25";} - //note - to include s10, s11 must also be included - case 15: {reg_list="ra, s0-s11"; xreg_list="x1, x8-x9, x18-x27";} - default: reserved(); -} -stack_adj = stack_adj_base + spimm[5:4] * 16; --- - -[source,sail] --- -RV32E: - -stack_adj_base = 16; -Valid values: -stack_adj = [16|32|48|64]; --- - -[source,sail] --- -RV32I: - -switch (rlist) { - case 4.. 7: stack_adj_base = 16; - case 8..11: stack_adj_base = 32; - case 12..14: stack_adj_base = 48; - case 15: stack_adj_base = 64; -} - -Valid values: -switch (rlist) { - case 4.. 7: stack_adj = [16|32|48| 64]; - case 8..11: stack_adj = [32|48|64| 80]; - case 12..14: stack_adj = [48|64|80| 96]; - case 15: stack_adj = [64|80|96|112]; -} --- - -[source,sail] --- -RV64: - -switch (rlist) { - case 4.. 5: stack_adj_base = 16; - case 6.. 7: stack_adj_base = 32; - case 8.. 9: stack_adj_base = 48; - case 10..11: stack_adj_base = 64; - case 12..13: stack_adj_base = 80; - case 14: stack_adj_base = 96; - case 15: stack_adj_base = 112; -} - -Valid values: -switch (rlist) { - case 4.. 5: stack_adj = [ 16| 32| 48| 64]; - case 6.. 7: stack_adj = [ 32| 48| 64| 80]; - case 8.. 9: stack_adj = [ 48| 64| 80| 96]; - case 10..11: stack_adj = [ 64| 80| 96|112]; - case 12..13: stack_adj = [ 80| 96|112|128]; - case 14: stack_adj = [ 96|112|128|144]; - case 15: stack_adj = [112|128|144|160]; -} --- diff --git a/src/zc/readme.md b/src/zc/readme.md deleted file mode 100644 index 8a333e7..0000000 --- a/src/zc/readme.md +++ /dev/null @@ -1,15 +0,0 @@ -This directory has the latest draft specification for the Zc extensions, without the PDF build. - -To see the latest built version go to: - -https://github.com/riscv/riscv-code-size-reduction/tags - -The benchmarking results for all Zc extensions are here: - -https://docs.google.com/spreadsheets/d/1bFMyGkuuulBXuIaMsjBINoCWoLwObr1l9h5TAWN8s7k/edit#gid=21966619 - -There are many changes since v0.50.1, which has been used for toolchain, spike, qemu and the CV32E41P implementation. - -This shows how the specification has changed from v0.50.1 to the current version: - -https://github.com/riscv/riscv-code-size-reduction/blob/master/Zc-specification/changes_since_v0.50.adoc diff --git a/src/zc/tablejump.adoc b/src/zc/tablejump.adoc deleted file mode 100644 index e490087..0000000 --- a/src/zc/tablejump.adoc +++ /dev/null @@ -1,49 +0,0 @@ -<<< - -[#insns-tablejump,reftext="Table Jump Overview"] -=== Table Jump Overview - -_cm.jt_ (<<#insns-cm_jt>>) and _cm.jalt_ (<<#insns-cm_jalt>>) are referred to as table jump. - -Table jump uses a 256-entry XLEN wide table in instruction memory to contain function addresses. -The table must be a minimum of 64-byte aligned. - -Table entries follow the current data endianness. This is different from normal instruction fetch which is always little-endian. - -_cm.jt_ and _cm.jalt_ encodings index the table, giving access to functions within the full XLEN wide address space. - -This is used as a form of dictionary compression to reduce the code size of _jal_ / _auipc+jalr_ / _jr_ / _auipc+jr_ instructions. - -Table jump allows the linker to replace the following instruction sequences with a _cm.jt_ or _cm.jalt_ encoding, and an entry in the table: - -* 32-bit _j_ calls -* 32-bit _jal_ ra calls -* 64-bit _auipc+jr_ calls to fixed locations -* 64-bit _auipc+jalr ra_ calls to fixed locations -** The _auipc+jr/jalr_ sequence is used because the offset from the PC is out of the ±1MB range. - -If a return address stack is implemented, then as _cm.jalt_ is equivalent to _jal ra_, it pushes to the stack. - -==== JVT - -The base of the table is in the JVT CSR (see <>), each table entry is XLEN bits. - -If the same function is called with and without linking then it must have two entries in the table. -This is typically caused by the same function being called with and without tail calling. - -[#tablejump-fault-handling] -==== Table Jump Fault handling - -For a table jump instruction, the table entry that the instruction selects is considered an extension of the instruction itself. -Hence, the execution of a table jump instruction involves two instruction fetches, the first to read the instruction (_cm.jt_/_cm.jalt_) -and the second to read from the jump vector table (JVT). Both instruction fetches are _implicit_ reads, and both require -execute permission; read permission is irrelevant. It is recommended that the second fetch be ignored for hardware triggers and breakpoints. - -Memory writes to the jump vector table require an instruction barrier (_fence.i_) to guarantee that they are visible to the instruction fetch. - -Multiple contexts may have different jump vector tables. JVT may be switched between them without an instruction barrier -if the tables have not been updated in memory since the last _fence.i_. - -If an exception occurs on either instruction fetch, xEPC is set to the PC of the table jump instruction, xCAUSE is set as expected for the type of fault and xTVAL (if not set to zero) contains the fetch address which caused the fault. - -include::Zcmt_footer.adoc[] diff --git a/src/zc/variable_def.adoc b/src/zc/variable_def.adoc deleted file mode 100644 index a660cac..0000000 --- a/src/zc/variable_def.adoc +++ /dev/null @@ -1 +0,0 @@ -The variables used in the assembly syntax are defined below. -- cgit v1.1 From c6c03aab9ef300cdec5c45c6faeddf5d74d2c5ec Mon Sep 17 00:00:00 2001 From: wmat Date: Thu, 22 Feb 2024 10:49:32 -0500 Subject: Fix formatting Fixing formatting of rs1' text. --- src/zc.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/zc.adoc b/src/zc.adoc index 28fc904..b407374 100644 --- a/src/zc.adoc +++ b/src/zc.adoc @@ -365,11 +365,11 @@ The immediate offset is formed as follows: Description: -This instruction loads a byte from the memory address formed by adding `__rs1__` to the zero extended immediate _uimm_. The resulting byte is zero extended to XLEN bits and is written to _rd'_. +This instruction loads a byte from the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. The resulting byte is zero extended to XLEN bits and is written to _rd'_. [NOTE] ==== -`__rd__` and `__rs1__` are from the standard 8-register set x8-x15. +_rd'_ and _rs1'_ are from the standard 8-register set x8-x15. ==== Prerequisites: -- cgit v1.1 From 0e2393b28ce9b40147645700ab67dcaf40540e7b Mon Sep 17 00:00:00 2001 From: wmat Date: Thu, 22 Feb 2024 11:23:43 -0500 Subject: Removing the word proposed. As this spec is ratified, the word proposed no longer applies. --- src/zc.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/zc.adoc b/src/zc.adoc index b407374..85b4590 100644 --- a/src/zc.adoc +++ b/src/zc.adoc @@ -123,7 +123,7 @@ The Zcd extension depends on the <> and D extensions. Zcb has simple code-size saving instructions which are easy to implement on all CPUs. -All proposed encodings are currently reserved for all architectures, and have no conflicts with any existing extensions. +All encodings are currently reserved for all architectures, and have no conflicts with any existing extensions. NOTE: Zcb can be implemented on _any_ CPU as the instructions are 16-bit versions of existing 32-bit instructions from the application class profile. -- cgit v1.1 From 5ce3278b31f40b08045f2847131c6c91fea39d3b Mon Sep 17 00:00:00 2001 From: Stefan O'Rear Date: Thu, 22 Feb 2024 15:03:37 -0500 Subject: Fix field widths in RV32 hstatus --- src/images/bytefield/hstatusreg-rv32.edn | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/images/bytefield/hstatusreg-rv32.edn b/src/images/bytefield/hstatusreg-rv32.edn index 02db585..2762ce6 100644 --- a/src/images/bytefield/hstatusreg-rv32.edn +++ b/src/images/bytefield/hstatusreg-rv32.edn @@ -51,9 +51,9 @@ (draw-box "6" {:span 5 :borders {}}) (draw-box "2" {:span 2 :borders {}}) (draw-box "1" {:borders {}}) -(draw-box "2" {:span 2 :borders {}}) +(draw-box "1" {:span 2 :borders {}}) (draw-box "1" {:span 3 :borders {}}) (draw-box "1" {:span 3 :borders {}}) -(draw-box "2" {:span 2 :borders {}}) +(draw-box "1" {:span 2 :borders {}}) (draw-box "5" {:span 2 :borders {}}) ---- -- cgit v1.1 From 3029ff4d6897eccba17c5d9d59f3550e7db26874 Mon Sep 17 00:00:00 2001 From: Stefan O'Rear Date: Thu, 22 Feb 2024 15:21:21 -0500 Subject: Fix field widths for RV64-only status register diagrams --- src/images/bytefield/hstatusreg.edn | 4 ++-- src/images/bytefield/hypv-mstatus.edn | 6 +++--- src/images/bytefield/vsstatusreg.edn | 6 +++--- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/src/images/bytefield/hstatusreg.edn b/src/images/bytefield/hstatusreg.edn index cff75db..cce601e 100644 --- a/src/images/bytefield/hstatusreg.edn +++ b/src/images/bytefield/hstatusreg.edn @@ -8,7 +8,7 @@ (def boxes-per-row 32) (draw-box nil {:span 3 :borders {}}) -(draw-box "HSXLEN-1" {:span 8 :borders {} :text-anchor "start"}) +(draw-box "63" {:span 8 :borders {} :text-anchor "start"}) (draw-box "34" {:borders {}}) (draw-box "33" {:span 2 :borders {} :text-anchor "start"}) (draw-box "32" {:span 2 :borders {} :text-anchor "end"}) @@ -31,7 +31,7 @@ (draw-box nil {:span 3 :borders {}}) (draw-box nil {:span 3 :borders {}}) -(draw-box "HSXLEN-34" {:span 9 :borders {}}) +(draw-box "30" {:span 9 :borders {}}) (draw-box "2" {:span 4 :borders {}}) (draw-box "9" {:span 6 :borders {}}) (draw-box "1" {:span 2 :borders {}}) diff --git a/src/images/bytefield/hypv-mstatus.edn b/src/images/bytefield/hypv-mstatus.edn index 2ed4a4d..885dc00 100644 --- a/src/images/bytefield/hypv-mstatus.edn +++ b/src/images/bytefield/hypv-mstatus.edn @@ -7,8 +7,8 @@ (def right-margin 30) (def boxes-per-row 32) -(draw-box "MSXLEN-1" {:span 3 :borders {}}) -(draw-box "MXLEN-2" {:span 4 :text-anchor "start" :borders {}}) +(draw-box "63" {:span 3 :borders {}}) +(draw-box "62" {:span 4 :text-anchor "start" :borders {}}) (draw-box "40" {:span 4 :text-anchor "end" :borders {}}) (draw-box "39" {:span 3 :borders {}}) (draw-box "38" {:span 3 :borders {}}) @@ -31,7 +31,7 @@ (draw-box nil {:borders {:top :border-unrelated :bottom :border-unrelated}}) (draw-box "1" {:span 3 :borders {}}) -(draw-box "MXLEN-41" {:span 8 :borders {}}) +(draw-box "23" {:span 8 :borders {}}) (draw-box "1" {:span 3 :borders {}}) (draw-box "1" {:span 3 :borders {}}) (draw-box "1" {:span 3 :borders {}}) diff --git a/src/images/bytefield/vsstatusreg.edn b/src/images/bytefield/vsstatusreg.edn index 87f4725..95780a6 100644 --- a/src/images/bytefield/vsstatusreg.edn +++ b/src/images/bytefield/vsstatusreg.edn @@ -7,8 +7,8 @@ (def right-margin 30) (def boxes-per-row 32) -(draw-box "VSXLEN-1" {:span 3 :borders {}}) -(draw-box "VSXLEN-2" {:span 5 :text-anchor "start" :borders {}}) +(draw-box "63" {:span 3 :borders {}}) +(draw-box "62" {:span 5 :text-anchor "start" :borders {}}) (draw-box "34" {:span 5 :text-anchor "end" :borders {}}) (draw-box "33" {:span 2 :text-anchor "start" :borders {}}) (draw-box "32" {:span 2 :text-anchor "end" :borders {}}) @@ -30,7 +30,7 @@ (draw-box nil {:span 2 :borders {}}) (draw-box "1" {:span 3 :borders {}}) -(draw-box "VSXLEN-35" {:span 10 :borders {}}) +(draw-box "29" {:span 10 :borders {}}) (draw-box "2" {:span 4 :borders {}}) (draw-box "12" {:span 6 :borders {}}) (draw-box "1" {:span 2 :borders {}}) -- cgit v1.1 From 1fb3f1452f462d18e538e8d3b7c744813535e204 Mon Sep 17 00:00:00 2001 From: OccupyMars2025 <3119002196@qq.com> Date: Fri, 23 Feb 2024 20:44:39 +0800 Subject: typo --- src/images/wavedrom/ct-unconditional-2.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/images/wavedrom/ct-unconditional-2.adoc b/src/images/wavedrom/ct-unconditional-2.adoc index ef33a9e..4dda824 100644 --- a/src/images/wavedrom/ct-unconditional-2.adoc +++ b/src/images/wavedrom/ct-unconditional-2.adoc @@ -4,7 +4,7 @@ .... {reg: [ {bits: 7, name: 'opcode', attr: ['7', 'JALR'], type: 8}, - {bits: 5, name: 'rd', attr: ['6', 'dest'], type: 2}, + {bits: 5, name: 'rd', attr: ['5', 'dest'], type: 2}, {bits: 3, name: 'funct3', attr: ['3', '0'], type: 8}, {bits: 5, name: 'rs1', attr: ['5', 'base'], type: 4}, {bits: 12, name: 'imm[11:0]', attr: ['12', 'offset[11:0]'], type: 3}, -- cgit v1.1 From c9d396762a62db3c481c1bbae0e605bdfae193e5 Mon Sep 17 00:00:00 2001 From: Stefan O'Rear Date: Thu, 22 Feb 2024 16:11:44 -0500 Subject: Explicitly allow side effects for a failed SC Since PTE updates are specified to occur as a result of "memory accesses" it is possible to interpret the old wording as requiring no PTE update for a failed SC, since there is no memory access. However, the PTE update is part of the translation process, which on natural implementations will occur before the reservation validity check can occur. This is consistent with the wording in "Addressing and Memory Protection" which allows traps to prevent the memory access without preventing the PTE update. --- src/a-st-ext.adoc | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/src/a-st-ext.adoc b/src/a-st-ext.adoc index 396d135..9fae7ab 100644 --- a/src/a-st-ext.adoc +++ b/src/a-st-ext.adoc @@ -62,10 +62,11 @@ if the reservation is still valid and the reservation set contains the bytes being written. If the SC.W succeeds, the instruction writes the word in _rs2_ to memory, and it writes zero to _rd_. If the SC.W fails, the instruction does not write to memory, and it writes a nonzero value -to _rd_. Regardless of success or failure, executing an SC.W instruction -invalidates any reservation held by this hart. LR.D and SC.D act -analogously on doublewords and are only available on RV64. For RV64, -LR.W and SC.W sign-extend the value placed in _rd_. +to _rd_. For the purposes of memory protection, a failed SC.W may be +treated like a store. Regardless of success or failure, executing an +SC.W instruction invalidates any reservation held by this hart. LR.D and +SC.D act analogously on doublewords and are only available on RV64. For +RV64, LR.W and SC.W sign-extend the value placed in _rd_. [NOTE] ==== -- cgit v1.1 From bea684d029bd96c2e433b87049f247f93a67c4a7 Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Mon, 26 Feb 2024 07:23:19 -0500 Subject: Update src/zc.adoc Co-authored-by: sorear Signed-off-by: Bill Traynor --- src/zc.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/zc.adoc b/src/zc.adoc index 85b4590..623b8c6 100644 --- a/src/zc.adoc +++ b/src/zc.adoc @@ -321,7 +321,7 @@ Several instructions in this specification use the following new instruction for [NOTE] ==== -c.mul uses the existing CA format +c.mul uses the existing CA format. ==== <<< -- cgit v1.1 From a22d6ec9bfd68be6d54561393d77b48584df039d Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Mon, 26 Feb 2024 07:24:01 -0500 Subject: Update src/zc.adoc Co-authored-by: sorear Signed-off-by: Bill Traynor --- src/zc.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/zc.adoc b/src/zc.adoc index 623b8c6..7622da7 100644 --- a/src/zc.adoc +++ b/src/zc.adoc @@ -278,7 +278,7 @@ state enable if Smstateen is implemented. See <> for details. This extension reuses some encodings from _c.fsdsp_. Therefore it is _incompatible_ with <>, which is included when C and D extensions are both present. -NOTE: Zcmt is primarily targeted at embedded class CPUs due to implementation complexity. Additionally, it is not compatible with architecture class profiles. +NOTE: Zcmt is primarily targeted at embedded class CPUs due to implementation complexity. Additionally, it is not compatible with RVA profiles. The Zcmt extension depends on the <> and Zicsr extensions. -- cgit v1.1 From cff8a0ec354e493539b090d9dff7366f6adab9cb Mon Sep 17 00:00:00 2001 From: wmat Date: Mon, 26 Feb 2024 07:45:57 -0500 Subject: Fixes unknown style errors. Running make with the --verbose flag uncovered some unknown style errors that caused the theme to not apply correctly and broke colorization of wavedrom. This fixes those errors and restores colorization. --- src/c-st-ext.adoc | 4 ++-- src/f-st-ext.adoc | 2 +- src/rv32.adoc | 2 +- src/zfh.adoc | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/src/c-st-ext.adoc b/src/c-st-ext.adoc index cfd9538..ca248f6 100644 --- a/src/c-st-ext.adoc +++ b/src/c-st-ext.adoc @@ -298,7 +298,7 @@ registers. ==== Stack-Pointer-Based Loads and Stores include::images/wavedrom/c-sp-load-store.adoc[] -[c-sp-load-store] +[[c-sp-load-store]] //.Stack-Pointer-Based Loads and Stores--these instructions use the CI format. These instructions use the CI format. @@ -336,7 +336,7 @@ _zero_-extended offset, scaled by 8, to the stack pointer, `x2`. It expands to `fld rd, offset(x2)`. include::images/wavedrom/c-sp-load-store-css.adoc[] -[c-sp-load-store-css] +[[c-sp-load-store-css]] //.Stack-Pointer-Based Loads and Stores--these instructions use the CSS format. These instructions use the CSS format. diff --git a/src/f-st-ext.adoc b/src/f-st-ext.adoc index 54d43ca..24941ed 100644 --- a/src/f-st-ext.adoc +++ b/src/f-st-ext.adoc @@ -37,7 +37,7 @@ floating-point register file state can reduce context-switch overhead. [[fprs]] .RISC-V standard F extension single-precision floating-point state -[col[s="<|^|>"|option[s="header",width="50%",align="center"grid="rows"] +[cols="<,^,>",options="header",width="50%",align="center",grid="rows"] |=== | [.small]#FLEN-1#| >| [.small]#0# 3+^| [.small]#f0# diff --git a/src/rv32.adoc b/src/rv32.adoc index 9ce3fb0..bd38ac8 100644 --- a/src/rv32.adoc +++ b/src/rv32.adoc @@ -50,7 +50,7 @@ holds the address of the current instruction. [[gprs]] .RISC-V base unprivileged integer register state. -[col[s="<|^|>"|option[s="header",width="50%",align="center"grid="rows"] +[cols="<,^,>",options="header",width="50%",align="center",grid="rows"] |=== <| [.small]#XLEN-1#| >| [.small]#0# 3+^| [.small]#x0/zero# diff --git a/src/zfh.adoc b/src/zfh.adoc index f16514c..9e8710e 100644 --- a/src/zfh.adoc +++ b/src/zfh.adoc @@ -91,7 +91,7 @@ floating-point number to a quad-precision floating-point number, or vice-versa, respectively. include::images/wavedrom/half-prec-flpt-to-flpt-conv.adoc[] -[half-prec-flpt-to-flpt-conv] +[[half-prec-flpt-to-flpt-conv]] Floating-point to floating-point sign-injection instructions, FSGNJ.H, FSGNJN.H, and FSGNJX.H are defined analogously to the single-precision -- cgit v1.1 From 6e2f30e620d1cd0ecd5a2f209b176249ef02c796 Mon Sep 17 00:00:00 2001 From: wmat Date: Mon, 26 Feb 2024 09:02:11 -0500 Subject: Commenting out 32-bit equiv instructions that are broken link. Commented out the 32-bit equivalent instructions that are not present yet and therefore are broken links. --- src/zc.adoc | 69 ++++++++++++++++++++++++++++--------------------------------- 1 file changed, 32 insertions(+), 37 deletions(-) diff --git a/src/zc.adoc b/src/zc.adoc index 7622da7..a00509a 100644 --- a/src/zc.adoc +++ b/src/zc.adoc @@ -375,10 +375,8 @@ _rd'_ and _rs1'_ are from the standard 8-register set x8-x15. Prerequisites: None - -32-bit equivalent: - -<> +//32-bit equivalent: +//<> Operation: @@ -416,7 +414,6 @@ Encoding (RV32, RV64): ],config:{bits:16}} .... - The immediate offset is formed as follows: [source,sail] @@ -438,10 +435,9 @@ _rd'_ and _rs1'_ are from the standard 8-register set x8-x15. Prerequisites: None - -32-bit equivalent: - -<> +//32-bit equivalent: +// +//<> Operation: @@ -500,10 +496,9 @@ _rd'_ and _rs1'_ are from the standard 8-register set x8-x15. Prerequisites: None - -32-bit equivalent: - -<> +//32-bit equivalent: +// +//<> Operation: @@ -561,10 +556,10 @@ _rs1'_ and _rs2'_ are from the standard 8-register set x8-x15. Prerequisites: None - -32-bit equivalent: - -<> +// +//32-bit equivalent: +// +//<> Operation: @@ -623,10 +618,10 @@ _rs1'_ and _rs2'_ are from the standard 8-register set x8-x15. Prerequisites: None - -32-bit equivalent: - -<> +// +//32-bit equivalent: +// +//<> Operation: [source,sail] @@ -755,10 +750,10 @@ X(rsdc) = EXTS(X(rsdc)[7..0]); Prerequisites: Zbb is also required. - -32-bit equivalent: - -<> from Zbb +// +//32-bit equivalent: +// +//<> from Zbb [NOTE] ==== @@ -812,10 +807,10 @@ _rd'/rs1'_ is from the standard 8-register set x8-x15. Prerequisites: Zbb is also required. - -32-bit equivalent: - -<> from Zbb +// +//32-bit equivalent: +// +//<> from Zbb [NOTE] ==== @@ -869,10 +864,10 @@ _rd'/rs1'_ is from the standard 8-register set x8-x15. Prerequisites: Zbb is also required. - -32-bit equivalent: - -<> from Zbb +// +//32-bit equivalent: +// +//<> from Zbb [NOTE] ==== @@ -1042,10 +1037,10 @@ _rd'/rs1'_ and _rs2'_ are from the standard 8-register set x8-x15. Prerequisites: M or Zmmul must be configured. - -32-bit equivalent: - -<> +// +//32-bit equivalent: +// +//<> [NOTE] ==== -- cgit v1.1 From 00c80a69090b30d06164853527b3179a2030b5cd Mon Sep 17 00:00:00 2001 From: wmat Date: Mon, 26 Feb 2024 09:21:47 -0500 Subject: Comment out 32-bit instruction sentence with broken link. Comment out 32-bit instruction with broken link. --- src/zc.adoc | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/zc.adoc b/src/zc.adoc index a00509a..8221860 100644 --- a/src/zc.adoc +++ b/src/zc.adoc @@ -731,10 +731,10 @@ _rd'/rs1'_ is from the standard 8-register set x8-x15. Prerequisites: Zbb is also required. - -32-bit equivalent: - -<> from Zbb +// +//32-bit equivalent: +// +//<> from Zbb [NOTE] -- cgit v1.1 From d13d0097c305fcf8e21ae5947473ea5907a60caa Mon Sep 17 00:00:00 2001 From: wmat Date: Mon, 26 Feb 2024 10:23:01 -0500 Subject: Fix broken link to PPO Rule 11. Fix broken link to PPO Rule 11. --- src/mm-eplan.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mm-eplan.adoc b/src/mm-eplan.adoc index 1243b1d..470a3ab 100644 --- a/src/mm-eplan.adoc +++ b/src/mm-eplan.adoc @@ -922,7 +922,7 @@ instruction will be followed by a conditional branch checking whether the outcome was successful; this implies that there will be a control dependency from the store operation generated by the SC instruction to any memory operations following the branch. PPO -rule <> in turn implies that any subsequent store +rule <> in turn implies that any subsequent store operations will appear later in the global memory order than the store operation generated by the SC. However, since control, address, and data dependencies are defined over memory operations, and since an -- cgit v1.1 From 633cfdbd3fef616905873ff62f4a4de77f3793f9 Mon Sep 17 00:00:00 2001 From: wmat Date: Mon, 26 Feb 2024 10:25:38 -0500 Subject: Fix broken link to PPO rule 11. Fix broken link to PPO rule 11. --- src/mm-eplan.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mm-eplan.adoc b/src/mm-eplan.adoc index 1243b1d..470a3ab 100644 --- a/src/mm-eplan.adoc +++ b/src/mm-eplan.adoc @@ -922,7 +922,7 @@ instruction will be followed by a conditional branch checking whether the outcome was successful; this implies that there will be a control dependency from the store operation generated by the SC instruction to any memory operations following the branch. PPO -rule <> in turn implies that any subsequent store +rule <> in turn implies that any subsequent store operations will appear later in the global memory order than the store operation generated by the SC. However, since control, address, and data dependencies are defined over memory operations, and since an -- cgit v1.1 From dec780bb185ba7d35bb9eb069f9ae8d48467a398 Mon Sep 17 00:00:00 2001 From: wmat Date: Mon, 26 Feb 2024 10:58:18 -0500 Subject: Update admonition icons Old admonition icons were deprecated, this uses newer icons. --- src/resources/themes/riscv-spec.yml | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/src/resources/themes/riscv-spec.yml b/src/resources/themes/riscv-spec.yml index 4aa9535..5cb07c9 100644 --- a/src/resources/themes/riscv-spec.yml +++ b/src/resources/themes/riscv-spec.yml @@ -164,14 +164,17 @@ admonition: padding: [0, $horizontal_rhythm, 0, $horizontal_rhythm] icon: note: - name: pencil-square-o + # name: pencil-square-o + name: far-edit stroke_color: 6489b3 tip: - name: comments-o + #name: comments-o + name: far-comments stroke_color: 646b74 size: 24 important: - name: info + #name: info + name: fas-info-circle stroke_color: 5f8c8b warning: stroke_color: 9c4d4b -- cgit v1.1 From 6eed8aa18de52a6db6ff214fbbf9a7cd3f161225 Mon Sep 17 00:00:00 2001 From: wmat Date: Mon, 26 Feb 2024 11:15:05 -0500 Subject: Fix links to propagate store Propagate store instruction links were broken, fixing. --- src/mm-formal.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mm-formal.adoc b/src/mm-formal.adoc index 2a49696..2d62438 100644 --- a/src/mm-formal.adoc +++ b/src/mm-formal.adoc @@ -1214,7 +1214,7 @@ time if: . every memory store operation that has been forwarded to latexmath:[$i'$] is propagated; . the conditions of <> is satisfied; -. the conditions of <> is satisfied (notice that an `sc` instruction can +. the conditions of <> is satisfied (notice that an `sc` instruction can only have one memory store operation); and . for every store slice latexmath:[$msos$] from latexmath:[$msoss$], latexmath:[$msos$] has not been overwritten, in the shared memory, by a @@ -1224,7 +1224,7 @@ since latexmath:[$msos$] was propagated to memory. Action: . apply the actions of <>; and -. apply the action of <>. +. apply the action of <>. [[late_sc_fail]] ===== Late `sc` fail -- cgit v1.1 From aea802b55d4cdba08854dd0f3f2310e1f3386ba7 Mon Sep 17 00:00:00 2001 From: wmat Date: Mon, 26 Feb 2024 11:39:17 -0500 Subject: Fix unordered bulletted list Use asciidoc to make doc match LaTeX unordered bulletted list. --- src/mm-formal.adoc | 63 +++++++++++++++++++++++++++++++++++------------------- 1 file changed, 41 insertions(+), 22 deletions(-) diff --git a/src/mm-formal.adoc b/src/mm-formal.adoc index 2d62438..513c6d3 100644 --- a/src/mm-formal.adoc +++ b/src/mm-formal.adoc @@ -525,7 +525,7 @@ a construction of the post-transition model state for each. Transitions for all instructions: -latexmath:[$\bullet$] <>: This transition represents a fetch and decode of a new instruction instance, as a program order successor of a previously fetched +* <>: This transition represents a fetch and decode of a new instruction instance, as a program order successor of a previously fetched instruction instance (or the initial fetch address). The model assumes the instruction memory is fixed; it does not describe @@ -534,16 +534,17 @@ not generate memory load operations, and the shared memory is not involved in the transition. Instead, the model depends on an external oracle that provides an opcode when given a memory location. -latexmath:[$\circ$] <>: This is a write of a register value. +[circle] +* <>: This is a write of a register value. -latexmath:[$\circ$] <>: This is a read of a register value from the most recent +* <>: This is a read of a register value from the most recent program-order-predecessor instruction instance that writes to that register. -latexmath:[$\circ$] <>: This covers pseudocode internal computation: arithmetic, function +* <>: This covers pseudocode internal computation: arithmetic, function calls, etc. -latexmath:[$\circ$] <>: At this point the instruction pseudocode is done, the instruction cannot be restarted, memory accesses cannot be discarded, and all memory +* <>: At this point the instruction pseudocode is done, the instruction cannot be restarted, memory accesses cannot be discarded, and all memory effects have taken place. For conditional branch and indirect jump instructions, any program order successors that were fetched from an address that is not the one that was written to the _pc_ register are @@ -552,15 +553,21 @@ them. Transitions specific to load instructions: -latexmath:[$\circ$] <>: At this point the memory footprint of the load instruction is +[circle] +* <>: At this point the memory footprint of the load instruction is provisionally known (it could change if earlier instructions are restarted) and its individual memory load operations can start being satisfied. -latexmath:[$\bullet$] <>: This partially or entirely satisfies a single memory load operation + +[disc] +* <>: This partially or entirely satisfies a single memory load operation by forwarding, from program-order-previous memory store operations. -latexmath:[$\bullet$] <>: This entirely satisfies the outstanding slices of a single memory + +* <>: This entirely satisfies the outstanding slices of a single memory load operation, from memory. -latexmath:[$\circ$] <>: At this point all the memory load operations of the instruction have + +[circle] +* <>: At this point all the memory load operations of the instruction have been entirely satisfied and the instruction pseudocode can continue executing. A load instruction can be subject to being restarted until the transition. But, under some conditions, the model might treat a load @@ -568,44 +575,56 @@ instruction as non-restartable even before it is finished (e.g. see ). Transitions specific to store instructions: -latexmath:[$\circ$] <>: At this point the memory footprint of the store is provisionally +[circle] +* <>: At this point the memory footprint of the store is provisionally known. -latexmath:[$\circ$] <>: At this point the memory store operations have their values and + +* <>: At this point the memory store operations have their values and program-order-successor memory load operations can be satisfied by forwarding from them. -latexmath:[$\circ$] <>: At this point the store operations are guaranteed to happen (the + +* <>: At this point the store operations are guaranteed to happen (the instruction can no longer be restarted or discarded), and they can start being propagated to memory. -latexmath:[$\bullet$] <>: This propagates a single memory store operation to memory. -latexmath:[$\circ$] <>: At this point all the memory store operations of the instruction + +[disc] +* <>: This propagates a single memory store operation to memory. + +[circle] +* <>: At this point all the memory store operations of the instruction have been propagated to memory, and the instruction pseudocode can continue executing. Transitions specific to `sc` instructions: -latexmath:[$\bullet$] <>: This causes the `sc` to fail, either a spontaneous fail or because -it is not paired with a program-order-previous `lr`. -latexmath:[$\bullet$] <>: This transition indicates the `sc` is paired with an `lr` and might +[disc] +* <>: This causes the `sc` to fail, either a spontaneous fail or becauset is not paired with a program-order-previous `lr`. + +* <>: This transition indicates the `sc` is paired with an `lr` and might succeed. -latexmath:[$\bullet$] <>: This is an atomic execution of the transitions <> and <>, it is enabled + +* <>: This is an atomic execution of the transitions <> and <>, it is enabled only if the stores from which the `lr` read from have not been overwritten. -latexmath:[$\bullet$] <>: This causes the `sc` to fail, either a spontaneous fail or because + +* <>: This causes the `sc` to fail, either a spontaneous fail or because the stores from which the `lr` read from have been overwritten. Transitions specific to AMO instructions: -latexmath:[$\bullet$] <>: This is an atomic execution of all the transitions needed to satisfy +[disc] +* <>: This is an atomic execution of all the transitions needed to satisfy the load operation, do the required arithmetic, and propagate the store operation. Transitions specific to fence instructions: -latexmath:[$\circ$] <> +[circle] +* <> The transitions labeled latexmath:[$\circ$] can always be taken eagerly, as soon as their precondition is satisfied, without excluding other -behavior; the latexmath:[$\bullet$] cannot. Although is marked with a +behavior; the latexmath:[$\bullet$] cannot. Although <> is marked with a latexmath:[$\bullet$], it can be taken eagerly as long as it is not taken infinitely many times. -- cgit v1.1 From 1423bfa7b90767b752c95b03ae50011f088997f8 Mon Sep 17 00:00:00 2001 From: wmat Date: Mon, 26 Feb 2024 13:12:01 -0500 Subject: Fix nested unordered list. Making nested unordered list match LaTeX. --- src/mm-formal.adoc | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/mm-formal.adoc b/src/mm-formal.adoc index 513c6d3..fb89914 100644 --- a/src/mm-formal.adoc +++ b/src/mm-formal.adoc @@ -560,8 +560,7 @@ restarted) and its individual memory load operations can start being satisfied. [disc] -* <>: This partially or entirely satisfies a single memory load operation -by forwarding, from program-order-previous memory store operations. +* <>: This partially or entirely satisfies a single memory load operation by forwarding, from program-order-previous memory store operations. * <>: This entirely satisfies the outstanding slices of a single memory load operation, from memory. -- cgit v1.1 From 740745ddaaf5afd3fee5a25be970b67309516c48 Mon Sep 17 00:00:00 2001 From: Yaksis <59007159+Yakkhini@users.noreply.github.com> Date: Tue, 27 Feb 2024 07:55:52 +0000 Subject: Fix typo: Quotation mark formatting mismatch in intro.adoc. --- src/intro.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/intro.adoc b/src/intro.adoc index 53379e7..78d7a34 100644 --- a/src/intro.adoc +++ b/src/intro.adoc @@ -195,7 +195,7 @@ environment but must do so in a way that guest harts operate like independent hardware threads. In particular, if there are more guest harts than host harts then the execution environment must be able to preempt the guest harts and must not wait indefinitely for guest -software on a guest hart to “yield" control of the guest hart. +software on a guest hart to "yield" control of the guest hart. ==== === RISC-V ISA Overview -- cgit v1.1 From 51afb41c36d71b98ce0bbb9002b3f405bcb5e93c Mon Sep 17 00:00:00 2001 From: wmat Date: Tue, 27 Feb 2024 11:48:32 -0500 Subject: Moving cmo chapter acknowledgements to header. Moving contributors from the cmo chapter to the contributors section of the header file. --- src/cmo.adoc | 19 ------------------- src/riscv-unprivileged.adoc | 6 +++--- 2 files changed, 3 insertions(+), 22 deletions(-) diff --git a/src/cmo.adoc b/src/cmo.adoc index 648c4ec..705166a 100644 --- a/src/cmo.adoc +++ b/src/cmo.adoc @@ -1,25 +1,6 @@ [[cmo]] == Base Cache Management Operation ISA Extensions -[acknowledgments] -=== Acknowledgments - -Contributors to this specification (in alphabetical order) include: + -Allen Baum, -Paul Donahue, -Greg Favor, -Andy Glew, -John Ingalls, -David Kruckemyer, -Josh Scheid, -Philipp Tomsich, -Paul Walmsley, -and -Derek Williams - -We express our gratitude to everyone that contributed to, reviewed, or improved -this specification through their comments and questions. - === Pseudocode for instruction semantics The semantics of each instruction in the <<#insns>> chapter is expressed in a diff --git a/src/riscv-unprivileged.adoc b/src/riscv-unprivileged.adoc index 0936b42..1f4289f 100644 --- a/src/riscv-unprivileged.adoc +++ b/src/riscv-unprivileged.adoc @@ -52,11 +52,11 @@ endif::[] _Contributors to all versions of the spec in alphabetical order (please contact editors to suggest corrections): Arvind, Krste Asanović, Rimas Avižienis, Jacob Bachmeyer, Christopher F. Batten, Allen J. Baum, Abel Bernabeu, Alex Bradbury, Scott Beamer, Preston Briggs, Christopher Celio, Chuanhua -Chang, David Chisnall, Paul Clayton, Palmer Dabbelt, Ken Dockser, Paul Donahue, Aaron Durbin, Roger Espasa, Greg Favor, Shaked Flur, Stefan Freudenberger, Marc Gauthier, Andy Glew, Jan Gray, Michael Hamburg, John -Hauser, David Horner, Bruce Hoult, Bill Huffman, Alexandre Joannou, Olof Johansson, Ben Keller, +Chang, David Chisnall, Paul Clayton, Palmer Dabbelt, Ken Dockser, Paul Donahue, Aaron Durbin, Roger Espasa, Greg Favor, Andy Glew, Shaked Flur, Stefan Freudenberger, Marc Gauthier, Andy Glew, Jan Gray, Michael Hamburg, John +Hauser, John Ingalls, David Horner, Bruce Hoult, Bill Huffman, Alexandre Joannou, Olof Johansson, Ben Keller, David Kruckemyer, Tariq Kurd, Yunsup Lee, Paul Loewenstein, Daniel Lustig, Yatin Manerkar, Luc Maranget, Margaret Martonosi, Phil McCoy, Christoph Müllner, Joseph Myers, Vijayanand Nagarajan, Rishiyur Nikhil, Jonas Oberhauser, Stefan O'Rear, Albert Ou, John Ousterhout, David Patterson, Christopher Pulte, Jose Renau, -Josh Scheid, Colin Schmidt, Peter Sewell, Susmit Sarkar, Ved Shanbhogue, Michael Taylor, Wesley Terpstra, Matt Thomas, Tommy Thorn, Philipp Tomsich, Caroline Trippel, Ray VanDeWalker, Muralidaran Vijayaraghavan, Megan Wachs, Andrew Waterman, Robert Watson, David Weaver, Derek Williams, Andrew Wright, Reinoud Zandijk, +Josh Scheid, Colin Schmidt, Peter Sewell, Susmit Sarkar, Ved Shanbhogue, Michael Taylor, Wesley Terpstra, Matt Thomas, Tommy Thorn, Philipp Tomsich, Caroline Trippel, Ray VanDeWalker, Muralidaran Vijayaraghavan, Megan Wachs, Paul Wamsley Andrew Waterman, Robert Watson, David Weaver, Derek Williams, Andrew Wright, Reinoud Zandijk, and Sizhuo Zhang._ _This document is released under a Creative Commons Attribution 4.0 International License._ -- cgit v1.1 From 36b5ce1e7480b44c407bbf08249ccd7018f713a3 Mon Sep 17 00:00:00 2001 From: Andrew Waterman Date: Tue, 27 Feb 2024 13:25:22 -0800 Subject: MAGs are NAPOT --- src/machine.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/machine.adoc b/src/machine.adoc index d9e9042..640a794 100644 --- a/src/machine.adoc +++ b/src/machine.adoc @@ -2631,7 +2631,7 @@ progress is detected. The misaligned atomicity granule PMA provides constrained support for misaligned AMOs. This PMA, if present, specifies the size of a _misaligned atomicity granule_, -a power-of-two number of bytes. +a naturally aligned power-of-two number of bytes. Specific supported values for this PMA are represented by MAG__NN__, e.g., MAG16 indicates the misaligned atomicity granule is at least 16 bytes. -- cgit v1.1 From da025f60c7e1a824548432cf13b7a9f761542654 Mon Sep 17 00:00:00 2001 From: Andrew Waterman Date: Tue, 27 Feb 2024 14:19:55 -0800 Subject: Clarify when SFENCE.W.INVAL/SFENCE.INVAL.IR are legal --- src/supervisor.adoc | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/src/supervisor.adoc b/src/supervisor.adoc index 2b30893..e9f2855 100644 --- a/src/supervisor.adoc +++ b/src/supervisor.adoc @@ -2021,6 +2021,12 @@ or VU-mode, or to execute SINVAL.VMA in VU-mode, raises a virtual-instruction exception. When `hstatus`.VTVM=1, an attempt to execute SINVAL.VMA in VS-mode also raises a virtual instruction exception. +Attempting to execute SFENCE.W.INVAL or SFENCE.INVAL.IR in U-mode +raises an illegal-instruction exception. +Doing so in VU-mode raises a virtual-instruction exception. +SFENCE.W.INVAL and SFENCE.INVAL.IR are unaffected by the `mstatus`.TVM and +`hstatus`.VTVM fields and hence are always permitted in S-mode and VS-mode. + [NOTE] ==== SFENCE.W.INVAL and SFENCE.INVAL.IR instructions do not need to be -- cgit v1.1 From ab93a86dbaa98ba15b5dd7ac6ddb1f7cd8316d5c Mon Sep 17 00:00:00 2001 From: Rafael Sene Date: Wed, 28 Feb 2024 18:07:46 -0300 Subject: Add new Action to Release a new ISA when merging a PR --- .github/workflows/merge-and-release.yml | 96 +++++++++++++++++++++++++++++++++ 1 file changed, 96 insertions(+) create mode 100644 .github/workflows/merge-and-release.yml diff --git a/.github/workflows/merge-and-release.yml b/.github/workflows/merge-and-release.yml new file mode 100644 index 0000000..0fafcc4 --- /dev/null +++ b/.github/workflows/merge-and-release.yml @@ -0,0 +1,96 @@ +name: Release New ISA When Merging a PR + +on: + pull_request: + branches: + - main + types: + - closed + +jobs: + if_merged: + if: github.event.pull_request.merged == true + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - run: | + echo The PR was successfully merged. + + - name: Set short SHA + run: echo "SHORT_SHA=$(echo ${GITHUB_SHA::7})" >> $GITHUB_ENV + + - name: Get current date + run: echo "CURRENT_DATE=$(date +'%Y-%m-%d')" >> $GITHUB_ENV + + - name: Pull Container + id: pull_container_image + run: | + docker pull riscvintl/riscv-docs-base-container-image:latest + + - name: Build Files + id: build_files + if: steps.pull_container_image.outcome == 'success' + run: | + docker run --rm -v ${{ github.workspace }}:/build riscvintl/riscv-docs-base-container-image:latest \ + /bin/sh -c 'cd ./build && make' + + # Upload the priv-isa-asciidoc PDF file + - name: Upload priv-isa-asciidoc.pdf + if: steps.build_files.outcome == 'success' + uses: actions/upload-artifact@v4 + with: + name: priv-isa-asciidoc-${{ env.SHORT_SHA }}.pdf + path: ${{ github.workspace }}/build/priv-isa-asciidoc.pdf + + # Upload the priv-isa-asciidoc HTML file + - name: Upload priv-isa-asciidoc.html + if: steps.build_files.outcome == 'success' + uses: actions/upload-artifact@v4 + with: + name: priv-isa-asciidoc-${{ env.SHORT_SHA }}.html + path: ${{ github.workspace }}/build/priv-isa-asciidoc.html + + # Upload the unpriv-isa-asciidoc PDF file + - name: Upload unpriv-isa-asciidoc.pdf + if: steps.build_files.outcome == 'success' + uses: actions/upload-artifact@v4 + with: + name: unpriv-isa-asciidoc-${{ env.SHORT_SHA }}.pdf + path: ${{ github.workspace }}/build/unpriv-isa-asciidoc.pdf + + # Upload the unpriv-isa-asciidoc HTML file + - name: Upload unpriv-isa-asciidoc.html + if: steps.build_files.outcome == 'success' + uses: actions/upload-artifact@v4 + with: + name: unpriv-isa-asciidoc-${{ env.SHORT_SHA }}.html + path: ${{ github.workspace }}/build/unpriv-isa-asciidoc.html + + # Upload the priv-isa-latex PDF file + - name: Upload riscv-privileged.pdf + if: steps.build_files.outcome == 'success' + uses: actions/upload-artifact@v4 + with: + name: riscv-privileged-latex-${{ env.SHORT_SHA }}.pdf + path: ${{ github.workspace }}/build/riscv-privileged.pdf + + - name: Create Release + uses: softprops/action-gh-release@v1 + env: + GITHUB_TOKEN: ${{ secrets.GHTOKEN }} + with: + tag_name: riscv-isa-release-${{ env.SHORT_SHA }}-${{ env.CURRENT_DATE }} + release_name: Release riscv-isa-release-${{ env.SHORT_SHA }}-${{ env.CURRENT_DATE }} + draft: false + prerelease: false + generate_release_notes: true + body: | + This release was created by: ${{ github.event.sender.login }} + RISC-V ISA released generated based on commit ${{ env.SHORT_SHA }} + files: | + ${{ github.workspace }}/build/priv-isa-asciidoc.pdf + ${{ github.workspace }}/build/priv-isa-asciidoc.html + ${{ github.workspace }}/build/unpriv-isa-asciidoc.pdf + ${{ github.workspace }}/build/unpriv-isa-asciidoc.html + ${{ github.workspace }}/build/riscv-privileged.pdf -- cgit v1.1 From 082937a8071debe1f471c404f354d7c1f85b655a Mon Sep 17 00:00:00 2001 From: Bill Traynor Date: Thu, 29 Feb 2024 07:54:52 -0500 Subject: Update merge-and-release.yml Removing refs to latex priv. Signed-off-by: Bill Traynor --- .github/workflows/merge-and-release.yml | 9 --------- 1 file changed, 9 deletions(-) diff --git a/.github/workflows/merge-and-release.yml b/.github/workflows/merge-and-release.yml index 0fafcc4..2de9268 100644 --- a/.github/workflows/merge-and-release.yml +++ b/.github/workflows/merge-and-release.yml @@ -67,14 +67,6 @@ jobs: name: unpriv-isa-asciidoc-${{ env.SHORT_SHA }}.html path: ${{ github.workspace }}/build/unpriv-isa-asciidoc.html - # Upload the priv-isa-latex PDF file - - name: Upload riscv-privileged.pdf - if: steps.build_files.outcome == 'success' - uses: actions/upload-artifact@v4 - with: - name: riscv-privileged-latex-${{ env.SHORT_SHA }}.pdf - path: ${{ github.workspace }}/build/riscv-privileged.pdf - - name: Create Release uses: softprops/action-gh-release@v1 env: @@ -93,4 +85,3 @@ jobs: ${{ github.workspace }}/build/priv-isa-asciidoc.html ${{ github.workspace }}/build/unpriv-isa-asciidoc.pdf ${{ github.workspace }}/build/unpriv-isa-asciidoc.html - ${{ github.workspace }}/build/riscv-privileged.pdf -- cgit v1.1 From 94d2dc31910691f778a5cc6168ec00591537715c Mon Sep 17 00:00:00 2001 From: Rafael Sene Date: Thu, 29 Feb 2024 10:08:20 -0300 Subject: Update merge-and-release.yml This commit improves the message set as part of the release notes. Signed-off-by: Rafael Sene --- .github/workflows/merge-and-release.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/merge-and-release.yml b/.github/workflows/merge-and-release.yml index 2de9268..3a93bc2 100644 --- a/.github/workflows/merge-and-release.yml +++ b/.github/workflows/merge-and-release.yml @@ -79,7 +79,7 @@ jobs: generate_release_notes: true body: | This release was created by: ${{ github.event.sender.login }} - RISC-V ISA released generated based on commit ${{ env.SHORT_SHA }} + Release of RISC-V ISA, built from commit ${{ env.SHORT_SHA }}, is now available. files: | ${{ github.workspace }}/build/priv-isa-asciidoc.pdf ${{ github.workspace }}/build/priv-isa-asciidoc.html -- cgit v1.1 From 5f31074971f13fee884b16e43dde733858625269 Mon Sep 17 00:00:00 2001 From: wmat Date: Thu, 29 Feb 2024 10:08:40 -0500 Subject: Fix xref text Adding alternate text for xrefs --- src/zc.adoc | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/src/zc.adoc b/src/zc.adoc index 8221860..e217797 100644 --- a/src/zc.adoc +++ b/src/zc.adoc @@ -63,7 +63,7 @@ As C defines the same instructions as Zca, Zcf and Zcd, the rule is that: * C+F implies Zcf (RV32 only) * C+D implies Zcd -[#Zce] +[reftext="Zce"] === Zce The Zce extension is intended to be used for microcontrollers, and includes all relevant Zc extensions. @@ -90,7 +90,7 @@ MISA.C is set if the following extensions are selected: * Zca, Zcd if D is specified (RV64 only) ** this configuration excludes Zcmp, Zcmt -[#Zca,Zca] +[reftext="Zca"] === Zca The Zca extension is added as way to refer to instructions in the C extension that do not include the floating-point loads and stores. @@ -102,7 +102,7 @@ Therefore it _excluded_ all 16-bit floating point loads and stores: _c.flw_, _c. the C extension only includes F/D instructions when D and F are also specified ==== -[#Zcf] +[reftext="Zcf"] === Zcf (RV32 only) Zcf is the existing set of compressed single precision floating point loads and stores: _c.flw_, _c.flwsp_, _c.fsw_, _c.fswsp_. @@ -111,14 +111,14 @@ Zcf is only relevant to RV32, it cannot be specified for RV64. The Zcf extension depends on the <> and F extensions. -[#Zcd] +[reftext="Zcd"] === Zcd Zcd is the existing set of compressed double precision floating point loads and stores: _c.fld_, _c.fldsp_, _c.fsd_, _c.fsdsp_. The Zcd extension depends on the <> and D extensions. -[#Zcb] +[reftext="Zcb"] === Zcb Zcb has simple code-size saving instructions which are easy to implement on all CPUs. @@ -1304,7 +1304,7 @@ addi sp, sp, 32 ret ---- -[[pushpop_non-idem-mem]] +[[pushpop_non-idem-mem,Non-idempotent memory handling]] ==== Non-idempotent memory handling An implementation may have a requirement to issue a PUSH/POP instruction to non-idempotent memory. -- cgit v1.1 From f714cd151aaafc9b60d9b2484660c15f90db349e Mon Sep 17 00:00:00 2001 From: wmat Date: Thu, 29 Feb 2024 10:32:34 -0500 Subject: Changes from tariq's feedback Fixed xref alternate text etc. --- src/zc.adoc | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/zc.adoc b/src/zc.adoc index e217797..2f2ef37 100644 --- a/src/zc.adoc +++ b/src/zc.adoc @@ -973,7 +973,7 @@ This instruction takes the one's complement of _rd'/rs1'_ and writes the result [NOTE] ==== - _rd'/rs1'_ is from the standard 8-register set x8-x15. +rd'/rs1' is from the standard 8-register set x8-x15. ==== Prerequisites: @@ -1410,7 +1410,7 @@ addi sp, sp, 64; <<< -[#insns-cm_push,reftext="Create stack frame: push registers, allocate additional stack space."] +[#insns-cm_push,reftext="cm.push"] ==== cm.push Synopsis: @@ -1605,7 +1605,7 @@ sp-=stack_adj; ---- <<< -[#insns-cm_pop,reftext="Pop registers, deallocate stack frame."] +[#insns-cm_pop,reftext="cm.pop"] ==== cm.pop Synopsis: @@ -1799,7 +1799,7 @@ sp+=stack_adj; ---- <<< -[#insns-cm_popretz,reftext="Pop registers, deallocate stack frame, return zero."] +[#insns-cm_popretz,reftext="cm.popretz"] ==== cm.popretz Synopsis: @@ -1997,7 +1997,7 @@ asm("ret"); ---- <<< -[#insns-cm_popret,reftext="Pop registers, deallocate stack frame, return."] +[#insns-cm_popret,reftext="cm.popret"] ==== cm.popret Synopsis: -- cgit v1.1 From d1f32027195156d4192e41b7cb4a0f9bf0f93dc9 Mon Sep 17 00:00:00 2001 From: Piotr Wegrzyn Date: Thu, 11 Jan 2024 16:47:14 +0100 Subject: Add marchid for Coreblocks Signed-off-by: Piotr Wegrzyn --- marchid.md | 1 + 1 file changed, 1 insertion(+) diff --git a/marchid.md b/marchid.md index 79f5e6d..82af726 100644 --- a/marchid.md +++ b/marchid.md @@ -61,3 +61,4 @@ ApogeoRV | Gabriele Tripi | [Gabriele Tripi](mailto:tripi. MicroRV32 | AGRA, Group of Computer Architecture, University of Bremen | [RISC-V @ AGRA](mailto:riscv@informatik.uni-bremen.de) | 41 | https://github.com/agra-uni-bremen/microrv32 QEMU | qemu.org | [QEMU Mailing List](mailto:qemu-riscv@nongnu.org) | 42 | https://qemu.org KianV | Hirosh Dabui | [Hirosh Dabui](mailto:hirosh@dabui.de) | 43 | https://github.com/splinedrive/kianRiscV +Coreblocks | Kuźnia Rdzeni, University of WrocÅ‚aw | [Coreblocks Team](mailto:coreblocks@cs.uni.wroc.pl) | 44 | https://github.com/kuznia-rdzeni/coreblocks -- cgit v1.1 From 497dbc9f75ead61757e153ee041e120c991e16ba Mon Sep 17 00:00:00 2001 From: wmat Date: Thu, 29 Feb 2024 14:35:52 -0500 Subject: Clean up. Replace note with info as we have no info admonition. --- src/smepmp.adoc | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/src/smepmp.adoc b/src/smepmp.adoc index c52d77c..547f723 100644 --- a/src/smepmp.adoc +++ b/src/smepmp.adoc @@ -134,12 +134,13 @@ Since PMP rules with a higher priority override rules with a lower priority, loc + An attacker that manages to tamper with a memory region used by S/U mode, even after successfully tricking a process running in M-mode to use or execute that region, will fail to perform a successful attack since that region will be _S/U-mode-only_ hence any access when in M-mode will trigger an access exception. + -[INFO] +[NOTE] ==== In order to support zero-copy transfers between M-mode and S/U-mode we need to either allow shared memory regions, or introduce a mechanism similar to the ``sstatus.SUM`` bit to temporary allow the high-privileged mode (in this case M-mode) to be able to perform loads and stores on the region of a less-privileged process (in this case S/U-mode). In our case after discussion within the group it seemed a better idea to follow the first approach and have this functionality encoded on a per-rule basis to avoid the risk of leaving a temporary, global bypass active when exiting M-mode, hence rendering memory access prevention useless. ==== + -[INFO] + +[NOTE] ==== Although it’s possible to use ``mstatus.MPRV`` in M-mode to read/write data on an _S/U-mode-only_ region using general purpose registers for copying, this will happen with S/U-mode permissions, honoring any MMU restrictions put in place by S-mode. Of course it’s still possible for M-mode to tamper with the page tables and / or add _S/U-mode-only_ rules and bypass the protections put in place by S-mode but if an attacker has managed to compromise M-mode to such extent, no security guarantees are possible in any way. *Also note that the threat model we present here assumes buggy software in M-mode, not compromised software*. We considered disabling ``mstatus.MPRV`` but it seemed too much and out of scope. ==== @@ -153,12 +154,12 @@ To make sure that shared data regions can’t be executed and shared code region For adding _Shared-region_ rules with executable privileges to share code segments between M-mode and S/U-mode, ``mseccfg.RLB`` needs to be implemented, or else such rules can only be added together with ``mseccfg.MML`` being set on *PMP Reset*. That's because the reserved encoding ``pmpcfg.RW=01`` being used for _Shared-region_ rules is only defined when ``mseccfg.MML`` is set, and 4b prevents the adition of rules with executable privileges on M-mode after ``mseccfg.MML`` is set unless ``mseccfg.RLB`` is also set. ==== + -[INFO] +[NOTE] ==== Using the ``pmpcfg.LRWX=1111`` encoding for a locked shared read-only data region was decided later on, its initial meaning was an M-mode-only read/write/execute region. The reason for that change was that the already defined shared data regions were not locked, so r/w access to M-mode couldn’t be restricted. In the same way we have execute-only shared code regions for both modes, it was decided to also be able to allow a least-privileged shared data region for both modes. This approach allows for example to share the .text section of an ELF with a shared code region and the .rodata section with a locked shared data region, without allowing M-mode to modify .rodata. We also decided that having a locked read/write/execute region in M-mode doesn’t make much sense and could be dangerous, since M-mode won’t be able to add further restrictions there (as in the case of S/U-mode where S-mode can further limit access to an ``pmpcfg.LWRX=0111`` region through the MMU), leaving the possibility of modifying an executable region in M-mode open. ==== + -[INFO] +[NOTE] ==== For encoding Shared-region rules initially we used one of the two reserved bits on pmpcfg (bit 5) but in order to avoid allocating an extra bit, since those bits are a very limited resource, it was decided to use the reserved R=0,W=1 combination. ==== -- cgit v1.1 From 10700d0766501c9a4ce16c7cab53c908c44d5158 Mon Sep 17 00:00:00 2001 From: Andrew Waterman Date: Tue, 20 Feb 2024 18:07:55 -0800 Subject: Update Smrnmi to account for Smdbltrp extension This is a backwards compatible change to support resumable double traps within M-mode when both extensions are implemented. Although the resumability feature is not terribly interesting, the reuse of the Smrnmi state for M-mode double traps aids debugging (and fault diagnosis for the hardware-error exception case). --- src/images/bytefield/mncause.edn | 6 +++--- src/rnmi.adoc | 12 +++++++++--- 2 files changed, 12 insertions(+), 6 deletions(-) diff --git a/src/images/bytefield/mncause.edn b/src/images/bytefield/mncause.edn index 5323f24..0b56e9b 100644 --- a/src/images/bytefield/mncause.edn +++ b/src/images/bytefield/mncause.edn @@ -8,9 +8,9 @@ (def boxes-per-row 32) (draw-column-headers {:height 24 :font-size 24 :labels (reverse ["0" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "MXLEN-2" "" "" "" "MXLEN-1" ""])}) -(draw-box "1" {:span 4}) -(draw-box (text "NMI Cause" {:font-size 24}) {:span 14 :text-anchor "end" :borders {:top :border-unrelated :bottom :border-unrelated :left :border-unrelated}}) +(draw-box "Interrupt" {:span 4}) +(draw-box (text "Exception Code" {:font-size 24}) {:span 14 :text-anchor "end" :borders {:top :border-unrelated :bottom :border-unrelated :left :border-unrelated}}) (draw-box (text "(WARL)" {:font-weight "bold" :font-size 24}) {:span 14 :text-anchor "start" :borders {:top :border-unrelated :right :border-unrelated :bottom :border-unrelated}}) (draw-box "1" {:span 4 :borders {}}) (draw-box "MXLEN-1" {:font-size 24 :span 28 :borders {}}) ----- \ No newline at end of file +---- diff --git a/src/rnmi.adoc b/src/rnmi.adoc index 9938917..cbd19da 100644 --- a/src/rnmi.adoc +++ b/src/rnmi.adoc @@ -71,9 +71,15 @@ of holding. .Resumable NMI cause `mncause`. include::images/bytefield/mncause.edn[] -The `mncause` CSR holds the reason for the NMI, with bit MXLEN-1 set to -1, and the NMI cause encoded in the least-significant bits or zero if -NMI causes are not supported. +The `mncause` CSR holds the reason for the NMI. +If the reason is an interrupt, bit MXLEN-1 is set to 1, and the NMI +cause is encoded in the least-significant bits. +If the reason is an interrupt and NMI causes are not supported, bit MXLEN-1 is +set to 1, and zero is written to the least-significant bits. +If the reason is an exception within M-mode that results in a double trap as +specified in the Smdbltrp extension, bit MXLEN-1 is set to 0 and the +least-significant bits are set to the cause code corresponding to the +exception that precipitated the double trap. .Resumable NMI status register `mnstatus`. include::images/bytefield/mnstatus.edn[] -- cgit v1.1 From b69129e737e80c869397d29b33d0aed1e35e0449 Mon Sep 17 00:00:00 2001 From: wmat Date: Tue, 5 Mar 2024 08:48:02 -0500 Subject: Set smstateen chapter version to 1.0.0 Set the version of the State Eneable Extension chspter to 1.0.0 --- src/smstateen.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/smstateen.adoc b/src/smstateen.adoc index 360eb31..b92f96e 100644 --- a/src/smstateen.adoc +++ b/src/smstateen.adoc @@ -1,5 +1,5 @@ [[smstateen]] -== "Smststeen" State Enable Extension +== "Smststeen" State Enable Extension, Version 1.0.0 === Motivation -- cgit v1.1 From f04e8d99db87b6ac73a849853466651119db4ca7 Mon Sep 17 00:00:00 2001 From: wmat Date: Tue, 5 Mar 2024 09:31:08 -0500 Subject: Add version 1.0.0 to sstc chapter Adding version 1.0.0 to the Stimecpm/Vstimecmp extension chapter. --- src/sstc.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/sstc.adoc b/src/sstc.adoc index 82e10fc..35e227d 100644 --- a/src/sstc.adoc +++ b/src/sstc.adoc @@ -1,5 +1,5 @@ [[Sstc]] -== "Stimecpm/Vstimecmp" Extension +== "Stimecpm/Vstimecmp" Extension, Version 1.0.0 The current Privileged arch specification only defines a hardware mechanism for generating machine-mode timer interrupts (based on the mtime and mtimecmp -- cgit v1.1 From e62a917975782c5e0e2f31dd5386a15041b668af Mon Sep 17 00:00:00 2001 From: wmat Date: Tue, 5 Mar 2024 09:32:28 -0500 Subject: Fix type of Stimecmp Stimecmp type was Stimecpm --- src/sstc.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/sstc.adoc b/src/sstc.adoc index 35e227d..8e7a8e7 100644 --- a/src/sstc.adoc +++ b/src/sstc.adoc @@ -1,5 +1,5 @@ [[Sstc]] -== "Stimecpm/Vstimecmp" Extension, Version 1.0.0 +== "Stimecmp/Vstimecmp" Extension, Version 1.0.0 The current Privileged arch specification only defines a hardware mechanism for generating machine-mode timer interrupts (based on the mtime and mtimecmp -- cgit v1.1 From 532ddc2703dd1ad9a24d242445115a39b4c5bbbf Mon Sep 17 00:00:00 2001 From: wmat Date: Tue, 5 Mar 2024 09:47:34 -0500 Subject: Add version 1.0.0 to chapter Add version 1.0.0 to count overflow chapter --- src/sscofpmt.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/sscofpmt.adoc b/src/sscofpmt.adoc index e5a8bdd..e5d48b8 100644 --- a/src/sscofpmt.adoc +++ b/src/sscofpmt.adoc @@ -1,5 +1,5 @@ [[Sscofpmf]] -== "Sscofpmf" Count Overflow and Mode-Based Filtering Extension +== "Sscofpmf" Count Overflow and Mode-Based Filtering Extension, Version 1.0.0 The current Privileged specification defines mhpmevent CSRs to select and control event counting by the associated hpmcounter CSRs, but provides no -- cgit v1.1 From b72d233d542a75d272bf343e38285165ccaddd29 Mon Sep 17 00:00:00 2001 From: wmat Date: Tue, 5 Mar 2024 09:57:54 -0500 Subject: Fixing formatting Fixing the monospaced formatting on some text. --- src/sscofpmt.adoc | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/src/sscofpmt.adoc b/src/sscofpmt.adoc index e5d48b8..101c15f 100644 --- a/src/sscofpmt.adoc +++ b/src/sscofpmt.adoc @@ -33,9 +33,7 @@ interrupt that is assigned to bit 13 in the mip/mie/sip/sie registers. This extension expands the hardware performance monitor description and extends the mhpmevent registers to 64 bits (in RV32) as follows: -The hardware performance monitor includes 29 additional 64-bit event counters -and 29 associated 64-bit event selector registers - the -mhpmcounter3–mhpmcounter31 and mhpmevent3–mhpmevent31 CSRs. +The hardware performance monitor includes 29 additional 64-bit event counters and 29 associated 64-bit event selector registers - the mhpmcounter3–mhpmcounter31 and mhpmevent3–mhpmevent31 CSRs. The mhpmcounters are WARL registers that support up to 64 bits of precision on RV32 and RV64. @@ -72,8 +70,8 @@ bit [57] 0 - Reserved for possible future modes bit [56] 0 - Reserved for possible future modes -Each of the five `x`INH bits, when set, inhibit counting of events while in -privilege mode `x`. All-zeroes for these bits results in counting of events in +Each of the five ``x``INH bits, when set, inhibit counting of events while in +privilege mode ``x``. All-zeroes for these bits results in counting of events in all modes. The OF bit is set when the corresponding hpmcounter overflows, and remains set -- cgit v1.1 From 6902a56f3e6bdffbaa790f3bc43237fbc5d69cf1 Mon Sep 17 00:00:00 2001 From: wmat Date: Thu, 7 Mar 2024 08:50:37 -0500 Subject: Applying Ved's reg diags and shortnames Applying Ved's reg diagrams and shortnames. --- src/smstateen.adoc | 173 +++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 120 insertions(+), 53 deletions(-) diff --git a/src/smstateen.adoc b/src/smstateen.adoc index b92f96e..1becce9 100644 --- a/src/smstateen.adoc +++ b/src/smstateen.adoc @@ -203,59 +203,126 @@ attempts to access the affected `sstateen` CSR from S-mode, ignoring writes and returning zero for reads. Bit 63 of each `hstateen` CSR is always writable (not read-only). -Initially, the following bits are defined in `mstateen0`, -`hstateen0`, and `sstateen0`: - -bit 0 - Custom state - -bit 1 - `fcsr` for Zfinx and related extensions (Zdinx, etc.) - -Bit 0 controls access to any and all custom state. - -(Bit 0 of these registers is not custom state itself; it is a standard field of -a standard CSR, either `mstateen0`, `hstateen0`, or `sstateen0`. The requirements -that non-standard extensions must meet to be _conforming_ are not relaxed due -solely to changes in the value of this bit. In particular, if software sets -this bit but does not execute any custom instructions or access any custom -state, the software must continue to execute as specified by all relevant -RISC-V standards, or the hardware is not standard-conforming.) - -Bit 1 applies only for the case when floating-point instructions operate on `x` -registers instead of `f` registers. Whenever `misa`.F = 1, bit 1 of `mstateen0` is -read-only zero (and hence read-only zero in `hstateen0` and `sstateen0` too). For -convenience, when the `stateen` CSRs are implemented and `misa`.F = 0, then if bit -1 of a controlling `stateen0` CSR is zero, _all_ floating-point instructions -cause an illegal instruction trap (or virtual instruction trap, if relevant), -as though they all access `fcsr`, regardless of whether they really do. - -In addition to the bits listed above for user-accessible state, the following -are also defined initially for `mstateen0`: - -bit 57 - `hcontext`, `scontext` - -bits 60:58 - Reserved for the RISC-V Advanced Interrupt Architecture - -bit 61 - Reserved for possible `henvcfg2`/`henvcfg2h`, `senvcfg2` - -bit 62 - `henvcfg`/`henvcfgh`, `senvcfg` - -bit 63 - `hstateen0`/`hstateen0h`, `sstateen0` - -The bits defined initially for `hstateen0` are the same as those for `mstateen0` -except applying only to state that is accessible in VS-mode: - -bit 57 - `scontext` - -bits 60:58 - Reserved for the RISC-V Advanced Interrupt Architecture - -bit 61 - Reserved for a possible `senvcfg2` - -bit 62 - `senvcfg` - -bit 63 - `sstateen0` - -(Setting `hstateen0` bit 58 to zero prevents a virtual machine from accessing the -hart's IMSIC the same as setting `hstatus`.VGEIN = 0.) +[wavedrom, , ] +.... +{reg: [ +{bits: 1, name: 'C'}, +{bits: 1, name: 'FCSR'}, +{bits: 1, name: 'JVT'}, +{bits: 61: name: WPRI} +], config:{lanes: 1, hspace:1024}} +.... + +The C bit controls access to any and all custom state. + +[NOTE] +Bit 0 of these registers is not custom state itself; it is a standard field of +a standard CSR, either mstateen0, hstateen0, or sstateen0. The +requirements that non-standard extensions must meet to be conforming are not +relaxed due solely to changes in the value of this bit. In particular, if +software sets this bit but does not execute any custom instructions or access +any custom state, the software must continue to execute as specified by all +relevant RISC-V standards, or the hardware is not standard-conforming. +The FCSR bit controls access to fcsr for the case when floating-point +instructions operate on x registers instead of f registers as specified by +the Zfinx and related extensions (Zdinx, etc.). Whenever misa.F = 1, bit 1 of +mstateen0 is read-only zero (and hence read-only zero in hstateen0 and +sstateen0 too). For convenience, when the stateen CSRs are implemented and +misa.F = 0, then if bit 1 of a controlling stateen0 CSR is zero, all +floating-point instructions cause an illegal instruction trap (or virtual +instruction trap, if relevant), as though they all access fcsr, regardless of +whether they really do. + +The JVT controls access to the JVT CSR provided by the Zcmt extension. + +== Machine State Enable Register (mstateen0) + +[wavedrom, , ] +.... +{reg: [ +{bits: 1, name: 'C'}, +{bits: 1, name: 'FCSR'}, +{bits: 1, name: 'JVT'}, +{bits: 53: name: WPRI} +{bits: 1: name: P1P13} +{bits: 1: name: CONTEXT} +{bits: 1: name: IMSIC} +{bits: 1: name: AIA} +{bits: 1: name: CSRIND} +{bits: 1: name: WPRI} +{bits: 1: name: ENVCFG} +{bits: 1: name: SE0} +], config:{lanes: 1, hspace:1024}} +.... + +The C, FCSR, and the JVT bits control access to the same state as +controllled by the same bits in sstateen0 CSR. + +The SE0 bit in mstateen0 controls access to the hstateen0, hstateen0h, +and the sstateen0 CSRs. + +The ENVCFG bit in mstateen0 controls access to the henvcfg, henvcfgh, +and the senvcfg CSRs. + +The CSRIND bit in mstateen0 controls access to the siselect, sireg*, +vsiselect, and the vsireg* CSRs provided by the Sscsrind extensions. + +The IMSIC bit in mstateen0 controls access to the IMSIC state, including +CSRs stopei and vstopei, provided by the Ssaia extension. + +The AIA bit in mstateen0 controls access to all state introduced by the +Ssaia extension and is not controlled by either the CSRIND or the IMSIC +bits. + +The CONTEXT bit in mstateen0 controls access to the scontext and +hcontext CSRs provided by the Sdtrig ISA extension. + +The P1P13 bit in mstateen0 controls access to the hedelegh introduced by +Privileged Specification Version 1.13. + +== Hypervisor State Enable Register (hstateen0) + +[wavedrom, , ] +.... +{reg: [ +{bits: 1, name: 'C'}, +{bits: 1, name: 'FCSR'}, +{bits: 1, name: 'JVT'}, +{bits: 54: name: WPRI} +{bits: 1: name: CONTEXT} +{bits: 1: name: IMSIC} +{bits: 1: name: AIA} +{bits: 1: name: CSRIND} +{bits: 1: name: WPRI} +{bits: 1: name: ENVCFG} +{bits: 1: name: SE0} +], config:{lanes: 1, hspace:1024}} +.... + +The C, FCSR, and the JVT bits control access, in VS- and VU-mode, to the +same state as controllled by the same bits in sstateen0 CSR. + +The SE0 bit in hstateen0 controls access to the sstateen0 CSR. + +The ENVCFG bit in hstateen0 controls access to the senvcfg CSRs. +The CSRIND bit in hstateen0 controls access to the siselect and the +sireg*, (really vsiselect and vsireg*) CSRs provided by the +Sscsrind extensions. + +The IMSIC bit in hstateen0 controls access to the guest IMSIC state, +including CSRs stopei (really vstopei), provided by the Ssaia extension. + +[NOTE] +Setting the IMSIC bit in hstateen0 to zero prevents a virtual machine from +accessing the hart's IMSIC the same as setting hstatus.VGEIN = 0. +The AIA bit in hstateen0 controls access to all state introduced by the +Ssaia extension and is not controlled by either the CSRIND or the IMSIC +bits of hstateen0. + +The CONTEXT bit in hstateen0 controls access to the scontext CSR +provided by the Sdtrig ISA extension. + +The CONTEXT bit in hstateen0 controls access to the scontext CSR. === Usage -- cgit v1.1 From 0e7fed17d0457fa0001cad500da73d5ac443fd1b Mon Sep 17 00:00:00 2001 From: wmat Date: Thu, 7 Mar 2024 09:22:26 -0500 Subject: Fixing wavedrom diags and heading levels Fixed the wavedrom diagrams and set the heading levels. --- src/smstateen.adoc | 54 ++++++++++++++++++++++++++++-------------------------- 1 file changed, 28 insertions(+), 26 deletions(-) diff --git a/src/smstateen.adoc b/src/smstateen.adoc index 1becce9..d23f481 100644 --- a/src/smstateen.adoc +++ b/src/smstateen.adoc @@ -203,19 +203,20 @@ attempts to access the affected `sstateen` CSR from S-mode, ignoring writes and returning zero for reads. Bit 63 of each `hstateen` CSR is always writable (not read-only). -[wavedrom, , ] +[wavedrom, ,svg] .... {reg: [ {bits: 1, name: 'C'}, {bits: 1, name: 'FCSR'}, {bits: 1, name: 'JVT'}, -{bits: 61: name: WPRI} -], config:{lanes: 1, hspace:1024}} +{bits: 61, name: 'WPRI'} +], config:{bits: 64, lanes: 4, hspace:1024}} .... The C bit controls access to any and all custom state. [NOTE] +==== Bit 0 of these registers is not custom state itself; it is a standard field of a standard CSR, either mstateen0, hstateen0, or sstateen0. The requirements that non-standard extensions must meet to be conforming are not @@ -232,27 +233,28 @@ misa.F = 0, then if bit 1 of a controlling stateen0 CSR is zero, all floating-point instructions cause an illegal instruction trap (or virtual instruction trap, if relevant), as though they all access fcsr, regardless of whether they really do. +==== The JVT controls access to the JVT CSR provided by the Zcmt extension. -== Machine State Enable Register (mstateen0) +=== Machine State Enable Register (mstateen0) -[wavedrom, , ] +[wavedrom, ,svg] .... {reg: [ {bits: 1, name: 'C'}, {bits: 1, name: 'FCSR'}, {bits: 1, name: 'JVT'}, -{bits: 53: name: WPRI} -{bits: 1: name: P1P13} -{bits: 1: name: CONTEXT} -{bits: 1: name: IMSIC} -{bits: 1: name: AIA} -{bits: 1: name: CSRIND} -{bits: 1: name: WPRI} -{bits: 1: name: ENVCFG} -{bits: 1: name: SE0} -], config:{lanes: 1, hspace:1024}} +{bits: 53, name: 'WPRI'}, +{bits: 1, name: 'P1P13'}, +{bits: 1, name: 'CONTEXT'}, +{bits: 1, name: 'IMSIC'}, +{bits: 1, name: 'AIA'}, +{bits: 1, name: 'CSRIND'}, +{bits: 1, name: 'WPRI'}, +{bits: 1, name: 'ENVCFG'}, +{bits: 1, name: 'SE0'}, +], config: {bits: 64, lanes: 4, hspace:1024}} .... The C, FCSR, and the JVT bits control access to the same state as @@ -280,23 +282,23 @@ hcontext CSRs provided by the Sdtrig ISA extension. The P1P13 bit in mstateen0 controls access to the hedelegh introduced by Privileged Specification Version 1.13. -== Hypervisor State Enable Register (hstateen0) +=== Hypervisor State Enable Register (hstateen0) -[wavedrom, , ] +[wavedrom, ,svg] .... {reg: [ {bits: 1, name: 'C'}, {bits: 1, name: 'FCSR'}, {bits: 1, name: 'JVT'}, -{bits: 54: name: WPRI} -{bits: 1: name: CONTEXT} -{bits: 1: name: IMSIC} -{bits: 1: name: AIA} -{bits: 1: name: CSRIND} -{bits: 1: name: WPRI} -{bits: 1: name: ENVCFG} -{bits: 1: name: SE0} -], config:{lanes: 1, hspace:1024}} +{bits: 54, name: 'WPRI'}, +{bits: 1, name: 'CONTEXT'}, +{bits: 1, name: 'IMSIC'}, +{bits: 1, name: 'AIA'}, +{bits: 1, name: 'CSRIND'}, +{bits: 1, name: 'WPRI'}, +{bits: 1, name: 'ENVCFG'}, +{bits: 1, name: 'SE0'}, +], config: {bits: 64, lanes: 4, hspace:1024}} .... The C, FCSR, and the JVT bits control access, in VS- and VU-mode, to the -- cgit v1.1 From 461ee5e2f918e8e4fc0fe1e1c60da8cb19f368f8 Mon Sep 17 00:00:00 2001 From: wmat Date: Thu, 7 Mar 2024 09:27:52 -0500 Subject: Remove line 325 as it's a repeat As per Ved, remove line 325 as it repeats line 322. --- src/smstateen.adoc | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/src/smstateen.adoc b/src/smstateen.adoc index d23f481..a381d2b 100644 --- a/src/smstateen.adoc +++ b/src/smstateen.adoc @@ -301,7 +301,7 @@ Privileged Specification Version 1.13. ], config: {bits: 64, lanes: 4, hspace:1024}} .... -The C, FCSR, and the JVT bits control access, in VS- and VU-mode, to the +The C, FCSR, and the JVT bits control access to the same state as controllled by the same bits in sstateen0 CSR. The SE0 bit in hstateen0 controls access to the sstateen0 CSR. @@ -324,8 +324,6 @@ bits of hstateen0. The CONTEXT bit in hstateen0 controls access to the scontext CSR provided by the Sdtrig ISA extension. -The CONTEXT bit in hstateen0 controls access to the scontext CSR. - === Usage After the writable bits of the machine-level `mstateen` CSRs are initialized to -- cgit v1.1 From fb5f1499edcb5ffdff3f646a450158fd08e4f1d8 Mon Sep 17 00:00:00 2001 From: wmat Date: Thu, 7 Mar 2024 09:35:57 -0500 Subject: Remove repeated line and add admonition formatting. Remove repeasted line and add admonition formatting. --- src/smstateen.adoc | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/src/smstateen.adoc b/src/smstateen.adoc index a381d2b..7fcca43 100644 --- a/src/smstateen.adoc +++ b/src/smstateen.adoc @@ -315,14 +315,13 @@ The IMSIC bit in hstateen0 controls access to the guest IMSIC state, including CSRs stopei (really vstopei), provided by the Ssaia extension. [NOTE] +==== Setting the IMSIC bit in hstateen0 to zero prevents a virtual machine from accessing the hart's IMSIC the same as setting hstatus.VGEIN = 0. The AIA bit in hstateen0 controls access to all state introduced by the Ssaia extension and is not controlled by either the CSRIND or the IMSIC bits of hstateen0. - -The CONTEXT bit in hstateen0 controls access to the scontext CSR -provided by the Sdtrig ISA extension. +==== === Usage -- cgit v1.1 From 65cbd1cd7018cb4e8be341f4fad27adee029dd45 Mon Sep 17 00:00:00 2001 From: wmat Date: Thu, 7 Mar 2024 12:20:50 -0500 Subject: Adding Ved's fixes. Adding Ved's fixes as per comments in GH. --- src/smstateen.adoc | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/smstateen.adoc b/src/smstateen.adoc index 7fcca43..dd66197 100644 --- a/src/smstateen.adoc +++ b/src/smstateen.adoc @@ -235,7 +235,7 @@ instruction trap, if relevant), as though they all access fcsr, regardless of whether they really do. ==== -The JVT controls access to the JVT CSR provided by the Zcmt extension. +The JVT bit controls access to the JVT CSR provided by the Zcmt extension. === Machine State Enable Register (mstateen0) @@ -301,8 +301,9 @@ Privileged Specification Version 1.13. ], config: {bits: 64, lanes: 4, hspace:1024}} .... -The C, FCSR, and the JVT bits control access to the -same state as controllled by the same bits in sstateen0 CSR. +The C bit controls access to any and all custom state. The FCSR and the JVT +bits control access to the same state as controlled by the same bits in the +sstateen0 CSR. The SE0 bit in hstateen0 controls access to the sstateen0 CSR. -- cgit v1.1 From c98e6a84241b6ef21cc979d4cb939938235be9bd Mon Sep 17 00:00:00 2001 From: wmat Date: Thu, 7 Mar 2024 12:50:53 -0500 Subject: Added missing text. Added text regarding CONTEXT bit in hstateen0. --- src/smstateen.adoc | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/smstateen.adoc b/src/smstateen.adoc index dd66197..f524581 100644 --- a/src/smstateen.adoc +++ b/src/smstateen.adoc @@ -257,8 +257,9 @@ The JVT bit controls access to the JVT CSR provided by the Zcmt extension. ], config: {bits: 64, lanes: 4, hspace:1024}} .... -The C, FCSR, and the JVT bits control access to the same state as -controllled by the same bits in sstateen0 CSR. +The C bit controls access to any and all custom state. The FCSR and the JVT +bits control access to the same state as controlled by the same bits in the +sstateen0 CSR. The SE0 bit in mstateen0 controls access to the hstateen0, hstateen0h, and the sstateen0 CSRs. @@ -324,6 +325,9 @@ Ssaia extension and is not controlled by either the CSRIND or the IMSIC bits of hstateen0. ==== +The CONTEXT bit in hstateen0 controls access to the scontext CSR +provided by the Sdtrig ISA extension. + === Usage After the writable bits of the machine-level `mstateen` CSRs are initialized to -- cgit v1.1 From c0e3faae2088e38d028997bfad2f29fd80b6c41f Mon Sep 17 00:00:00 2001 From: Andrew Waterman Date: Thu, 7 Mar 2024 14:14:27 -0800 Subject: Clarify behavior of CSR access side effects See https://lists.riscv.org/g/tech-unprivileged/message/717 --- src/zicsr.adoc | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/src/zicsr.adoc b/src/zicsr.adoc index 9648bb7..d6f6784 100644 --- a/src/zicsr.adoc +++ b/src/zicsr.adoc @@ -38,16 +38,14 @@ of the CSR, zero-extends the value to XLEN bits, and writes it to integer register _rd_. The initial value in integer register _rs1_ is treated as a bit mask that specifies bit positions to be set in the CSR. Any bit that is high in _rs1_ will cause the corresponding bit to be set -in the CSR, if that CSR bit is writable. Other bits in the CSR are not -explicitly written. +in the CSR, if that CSR bit is writable. The CSRRC (Atomic Read and Clear Bits in CSR) instruction reads the value of the CSR, zero-extends the value to XLEN bits, and writes it to integer register _rd_. The initial value in integer register _rs1_ is treated as a bit mask that specifies bit positions to be cleared in the CSR. Any bit that is high in _rs1_ will cause the corresponding bit to -be cleared in the CSR, if that CSR bit is writable. Other bits in the -CSR are not explicitly written. +be cleared in the CSR, if that CSR bit is writable. For both CSRRS and CSRRC, if _rs1_=`x0`, then the instruction will not write to the CSR at all, and so shall not cause any of the side effects @@ -105,6 +103,19 @@ CSR <> summarizes the behavior of the CSR instructions with respect to whether they read and/or write the CSR. +In addition to side effects that occur as a consequence of reading or +writing a CSR, individual fields within a CSR might have side effects +when written. The CSRRW[I] instructions action side effects for all +such fields within the written CSR. The CSRRS[I] an CSRRC[I] instructions +only action side effects for fields for which the _rs1_ or _uimm_ argument +has at least one bit set corresponding to that field. +[NOTE] +==== +As of this writing, no standard CSRs have side effects on field writes. +Hence, whether a standard CSR access has any side effects can be determined +solely from the opcode. +==== + For any event or consequence that occurs due to a CSR having a particular value, if a write to the CSR gives it that value, the resulting event or consequence is said to be an _indirect effect_ of the -- cgit v1.1 From 75965fab752170631f994e9aa95a690b970cf100 Mon Sep 17 00:00:00 2001 From: Chih-Min Chao Date: Thu, 7 Mar 2024 23:51:30 -0800 Subject: fix typo for hypervisor Signed-off-by: Chih-Min Chao --- src/hypervisor.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/hypervisor.adoc b/src/hypervisor.adoc index 6d3d226..e4775b5 100644 --- a/src/hypervisor.adoc +++ b/src/hypervisor.adoc @@ -2334,7 +2334,7 @@ nonzero value (the faulting guest physical address) is written to <>; zero is not allowed. [[pseudoinsts]] -.Special pseudoinstruction values for guest-page faults. The RV32 values are used when VSXLEN=32, and the TV64 values when VSXLEN=64. +.Special pseudoinstruction values for guest-page faults. The RV32 values are used when VSXLEN=32, and the RV64 values when VSXLEN=64. [%autowidth,float="center",align="center",cols="<,<",options="header"] |=== |Value |Meaning -- cgit v1.1 From cdb668795d8aa8aaa14e8c6bd4e185ed4f34eea7 Mon Sep 17 00:00:00 2001 From: Andrew Waterman Date: Tue, 12 Mar 2024 15:55:55 -0700 Subject: Further clarify behavior of CSR access side effects --- src/zicsr.adoc | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/src/zicsr.adoc b/src/zicsr.adoc index d6f6784..50183a8 100644 --- a/src/zicsr.adoc +++ b/src/zicsr.adoc @@ -52,11 +52,13 @@ write to the CSR at all, and so shall not cause any of the side effects that might otherwise occur on a CSR write, nor raise illegal-instruction exceptions on accesses to read-only CSRs. Both CSRRS and CSRRC always read the addressed CSR and cause any read side effects regardless of -_rs1_ and _rd_ fields. Note that if _rs1_ specifies a register holding a -zero value other than `x0`, the instruction will still attempt to write -the unmodified value back to the CSR and will cause any attendant side -effects. A CSRRW with _rs1_=`x0` will attempt to write zero to the -destination CSR. +_rs1_ and _rd_ fields. +Note that if _rs1_ specifies a register other than `x0`, and that register +holds a zero value, the instruction will not action any attendant per-field +side effects, but will action any side effects caused by writing to the entire +CSR. + +A CSRRW with _rs1_=`x0` will attempt to write zero to the destination CSR. The CSRRWI, CSRRSI, and CSRRCI variants are similar to CSRRW, CSRRS, and CSRRC respectively, except they update the CSR using an XLEN-bit value @@ -114,6 +116,8 @@ has at least one bit set corresponding to that field. As of this writing, no standard CSRs have side effects on field writes. Hence, whether a standard CSR access has any side effects can be determined solely from the opcode. + +Defining CSRs with side effects on field writes is not recommended. ==== For any event or consequence that occurs due to a CSR having a -- cgit v1.1 From c5f0e2d97f1a7a14702aa76ed8f83588a0f8b0a7 Mon Sep 17 00:00:00 2001 From: wmat Date: Wed, 13 Mar 2024 10:41:16 -0400 Subject: Replace all included insns with the actual text. The standard integration of a chapter requires that the chapter exist as a single file. This change replaces all the bitmanip insns with the text of thos instructions. --- src/b-st-ext.adoc | 2722 +++++++++++++++++++++++++++++++++++++++++++++- src/insns/add_uw.adoc | 50 - src/insns/andn.adoc | 47 - src/insns/bclr.adoc | 45 - src/insns/bclri.adoc | 59 - src/insns/bext.adoc | 46 - src/insns/bexti.adoc | 59 - src/insns/binv.adoc | 45 - src/insns/binvi.adoc | 59 - src/insns/bset.adoc | 44 - src/insns/bseti.adoc | 59 - src/insns/clmul.adoc | 57 - src/insns/clmulh.adoc | 57 - src/insns/clmulr.adoc | 63 -- src/insns/clz.adoc | 52 - src/insns/clzw.adoc | 53 - src/insns/cpop.adoc | 56 - src/insns/cpopw.adoc | 49 - src/insns/ctz.adoc | 53 - src/insns/ctzw.adoc | 52 - src/insns/max.adoc | 60 - src/insns/maxu.adoc | 50 - src/insns/min.adoc | 50 - src/insns/minu.adoc | 50 - src/insns/orc_b.adoc | 51 - src/insns/orn.adoc | 47 - src/insns/pack.adoc | 46 - src/insns/packh.adoc | 47 - src/insns/packw.adoc | 49 - src/insns/rev8.adoc | 82 -- src/insns/revb.adoc | 46 - src/insns/rol.adoc | 52 - src/insns/rolw.adoc | 51 - src/insns/ror.adoc | 52 - src/insns/rori.adoc | 66 -- src/insns/roriw.adoc | 54 - src/insns/rorw.adoc | 51 - src/insns/sext_b.adoc | 43 - src/insns/sext_h.adoc | 43 - src/insns/sh1add.adoc | 46 - src/insns/sh1add_uw.adoc | 46 - src/insns/sh2add.adoc | 43 - src/insns/sh2add_uw.adoc | 48 - src/insns/sh3add.adoc | 42 - src/insns/sh3add_uw.adoc | 45 - src/insns/slli_uw.adoc | 51 - src/insns/unzip.adoc | 60 - src/insns/xnor.adoc | 47 - src/insns/xpermb.adoc | 60 - src/insns/xpermn.adoc | 60 - src/insns/zext_h.adoc | 61 -- src/insns/zip.adoc | 60 - 52 files changed, 2671 insertions(+), 2715 deletions(-) delete mode 100644 src/insns/add_uw.adoc delete mode 100644 src/insns/andn.adoc delete mode 100644 src/insns/bclr.adoc delete mode 100644 src/insns/bclri.adoc delete mode 100644 src/insns/bext.adoc delete mode 100644 src/insns/bexti.adoc delete mode 100644 src/insns/binv.adoc delete mode 100644 src/insns/binvi.adoc delete mode 100644 src/insns/bset.adoc delete mode 100644 src/insns/bseti.adoc delete mode 100644 src/insns/clmul.adoc delete mode 100644 src/insns/clmulh.adoc delete mode 100644 src/insns/clmulr.adoc delete mode 100644 src/insns/clz.adoc delete mode 100644 src/insns/clzw.adoc delete mode 100644 src/insns/cpop.adoc delete mode 100644 src/insns/cpopw.adoc delete mode 100644 src/insns/ctz.adoc delete mode 100644 src/insns/ctzw.adoc delete mode 100644 src/insns/max.adoc delete mode 100644 src/insns/maxu.adoc delete mode 100644 src/insns/min.adoc delete mode 100644 src/insns/minu.adoc delete mode 100644 src/insns/orc_b.adoc delete mode 100644 src/insns/orn.adoc delete mode 100644 src/insns/pack.adoc delete mode 100644 src/insns/packh.adoc delete mode 100644 src/insns/packw.adoc delete mode 100644 src/insns/rev8.adoc delete mode 100644 src/insns/revb.adoc delete mode 100644 src/insns/rol.adoc delete mode 100644 src/insns/rolw.adoc delete mode 100644 src/insns/ror.adoc delete mode 100644 src/insns/rori.adoc delete mode 100644 src/insns/roriw.adoc delete mode 100644 src/insns/rorw.adoc delete mode 100644 src/insns/sext_b.adoc delete mode 100644 src/insns/sext_h.adoc delete mode 100644 src/insns/sh1add.adoc delete mode 100644 src/insns/sh1add_uw.adoc delete mode 100644 src/insns/sh2add.adoc delete mode 100644 src/insns/sh2add_uw.adoc delete mode 100644 src/insns/sh3add.adoc delete mode 100644 src/insns/sh3add_uw.adoc delete mode 100644 src/insns/slli_uw.adoc delete mode 100644 src/insns/unzip.adoc delete mode 100644 src/insns/xnor.adoc delete mode 100644 src/insns/xpermb.adoc delete mode 100644 src/insns/xpermn.adoc delete mode 100644 src/insns/zext_h.adoc delete mode 100644 src/insns/zip.adoc diff --git a/src/b-st-ext.adoc b/src/b-st-ext.adoc index 7bf1dc8..59abd07 100644 --- a/src/b-st-ext.adoc +++ b/src/b-st-ext.adoc @@ -1035,109 +1035,2729 @@ common operations in cryptographic workloads. |=== +<<< + [#insns,reftext="Instructions (in alphabetical order)"] === Instructions (in alphabetical order) -include::insns/add_uw.adoc[] + +[#insns-add_uw,reftext=Add unsigned word] +==== add.uw + +Synopsis:: +Add unsigned word + +Mnemonic:: +add.uw _rd_, _rs1_, _rs2_ + + +Pseudoinstructions:: +zext.w _rd_, _rs1_ → add.uw _rd_, _rs1_, zero + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x3b, attr: ['OP-32'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x0, attr: ['ADD.UW'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x04, attr: ['ADD.UW'] }, +]} +.... + +Description:: +This instruction performs an XLEN-wide addition between _rs2_ and the zero-extended least-significant word of _rs1_. + +Operation:: +[source,sail] +-- +let base = X(rs2); +let index = EXTZ(X(rs1)[31..0]); + +X(rd) = base + index; +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zba (<<#zba>>) +|0.93 +|Frozen +|=== + <<< -include::insns/andn.adoc[] +[#insns-andn,reftext="AND with inverted operand"] +==== andn + +Synopsis:: +AND with inverted operand + +Mnemonic:: +andn _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x7, attr: ['ANDN']}, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x20, attr: ['ANDN'] }, +]} +.... + +Description:: +This instruction performs the bitwise logical AND operation between _rs1_ and the bitwise inversion of _rs2_. + +Operation:: +[source,sail] +-- +X(rd) = X(rs1) & ~X(rs2); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen + +|Zbkb (<<#zbkb>>) +|v0.9.4 +|Frozen +|=== + <<< -include::insns/bclr.adoc[] +[#insns-bclr,reftext="Single-Bit Clear (Register)"] +==== bclr + +Synopsis:: +Single-Bit Clear (Register) + +Mnemonic:: +bclr _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['BCLR'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x24, attr: ['BCLR/BEXT'] }, +]} +.... + +Description:: +This instruction returns _rs1_ with a single bit cleared at the index specified in _rs2_. +The index is read from the lower log2(XLEN) bits of _rs2_. + +Operation:: +[source,sail] +-- +let index = X(rs2) & (XLEN - 1); +X(rd) = X(rs1) & ~(1 << index) +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbs (<<#zbs>>) +|0.93 +|Frozen +|=== + <<< -include::insns/bclri.adoc[] +[#insns-bclri,reftext="Single-Bit Clear (Immediate)"] +==== bclri + +Synopsis:: +Single-Bit Clear (Immediate) + +Mnemonic:: +bclri _rd_, _rs1_, _shamt_ + +Encoding (RV32):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['BCLRI'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'shamt' }, + { bits: 7, name: 0x24, attr: ['BCLRI'] }, +]} +.... + +Encoding (RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['BCLRI'] }, + { bits: 5, name: 'rs1' }, + { bits: 6, name: 'shamt' }, + { bits: 6, name: 0x12, attr: ['BCLRI'] }, +]} +.... + +Description:: +This instruction returns _rs1_ with a single bit cleared at the index specified in _shamt_. +The index is read from the lower log2(XLEN) bits of _shamt_. +For RV32, the encodings corresponding to shamt[5]=1 are reserved. + +Operation:: +[source,sail] +-- +let index = shamt & (XLEN - 1); +X(rd) = X(rs1) & ~(1 << index) +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbs (<<#zbs>>) +|0.93 +|Frozen +|=== + <<< -include::insns/bext.adoc[] +[#insns-bext,reftext="Single-Bit Extract (Register)"] +==== bext + +Synopsis:: +Single-Bit Extract (Register) +// Should we describe this as a Set-if-bit-is-set? + +Mnemonic:: +bext _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x5, attr: ['BEXT'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x24, attr: ['BCLR/BEXT'] }, +]} +.... + +Description:: +This instruction returns a single bit extracted from _rs1_ at the index specified in _rs2_. +The index is read from the lower log2(XLEN) bits of _rs2_. + +Operation:: +[source,sail] +-- +let index = X(rs2) & (XLEN - 1); +X(rd) = (X(rs1) >> index) & 1; +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbs (<<#zbs>>) +|0.93 +|Frozen +|=== + <<< -include::insns/bexti.adoc[] +[#insns-bexti,reftext="Single-Bit Extract (Immediate)"] +==== bexti + +Synopsis:: +Single-Bit Extract (Immediate) + +Mnemonic:: +bexti _rd_, _rs1_, _shamt_ + +Encoding (RV32):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x5, attr: ['BEXTI'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'shamt' }, + { bits: 7, name: 0x24, attr: ['BEXTI/BCLRI'] }, +]} +.... + +Encoding (RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x5, attr: ['BEXTI'] }, + { bits: 5, name: 'rs1' }, + { bits: 6, name: 'shamt' }, + { bits: 6, name: 0x12, attr: ['BEXTI/BCLRI'] }, +]} +.... + +Description:: +This instruction returns a single bit extracted from _rs1_ at the index specified in _rs2_. +The index is read from the lower log2(XLEN) bits of _shamt_. +For RV32, the encodings corresponding to shamt[5]=1 are reserved. + +Operation:: +[source,sail] +-- +let index = shamt & (XLEN - 1); +X(rd) = (X(rs1) >> index) & 1; +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbs (<<#zbs>>) +|0.93 +|Frozen +|=== + <<< -include::insns/binv.adoc[] +[#insns-binv,reftext="Single-Bit Invert (Register)"] +==== binv + +Synopsis:: +Single-Bit Invert (Register) + +Mnemonic:: +binv _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['BINV'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x34, attr: ['BINV'] }, +]} +.... + +Description:: +This instruction returns _rs1_ with a single bit inverted at the index specified in _rs2_. +The index is read from the lower log2(XLEN) bits of _rs2_. + +Operation:: +[source,sail] +-- +let index = X(rs2) & (XLEN - 1); +X(rd) = X(rs1) ^ (1 << index) +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbs (<<#zbs>>) +|0.93 +|Frozen +|=== + <<< -include::insns/binvi.adoc[] +[#insns-binvi,reftext="Single-Bit Invert (Immediate)"] +==== binvi + +Synopsis:: +Single-Bit Invert (Immediate) + +Mnemonic:: +binvi _rd_, _rs1_, _shamt_ + +Encoding (RV32):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['BINV'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'shamt' }, + { bits: 7, name: 0x34, attr: ['BINVI'] }, +]} +.... + +Encoding (RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['BINV'] }, + { bits: 5, name: 'rs1' }, + { bits: 6, name: 'shamt' }, + { bits: 6, name: 0x1a, attr: ['BINVI'] }, +]} +.... + +Description:: +This instruction returns _rs1_ with a single bit inverted at the index specified in _shamt_. +The index is read from the lower log2(XLEN) bits of _shamt_. +For RV32, the encodings corresponding to shamt[5]=1 are reserved. + +Operation:: +[source,sail] +-- +let index = shamt & (XLEN - 1); +X(rd) = X(rs1) ^ (1 << index) +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbs (<<#zbs>>) +|0.93 +|Frozen +|=== + <<< -include::insns/bset.adoc[] +[#insns-bset,reftext="Single-Bit Set (Register)"] +==== bset + +Synopsis:: +Single-Bit Set (Register) + +Mnemonic:: +bset _rd_, _rs1_,_rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['BSET'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x14, attr: ['BSET'] }, +]} +.... + +Description:: +This instruction returns _rs1_ with a single bit set at the index specified in _rs2_. +The index is read from the lower log2(XLEN) bits of _rs2_. + +Operation:: +[source,sail] +-- +let index = X(rs2) & (XLEN - 1); +X(rd) = X(rs1) | (1 << index) +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbs (<<#zbs>>) +|0.93 +|Frozen +|=== + <<< -include::insns/bseti.adoc[] +[#insns-bseti,reftext="Single-Bit Set (Immediate)"] +==== bseti + +Synopsis:: +Single-Bit Set (Immediate) + +Mnemonic:: +bseti _rd_, _rs1_,_shamt_ + +Encoding (RV32):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['BSETI'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'shamt' }, + { bits: 7, name: 0x14, attr: ['BSETI'] }, +]} +.... + +Encoding (RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['BSETI'] }, + { bits: 5, name: 'rs1' }, + { bits: 6, name: 'shamt' }, + { bits: 6, name: 0x0a, attr: ['BSETI'] }, +]} +.... + +Description:: +This instruction returns _rs1_ with a single bit set at the index specified in _shamt_. +The index is read from the lower log2(XLEN) bits of _shamt_. +For RV32, the encodings corresponding to shamt[5]=1 are reserved. + +Operation:: +[source,sail] +-- +let index = shamt & (XLEN - 1); +X(rd) = X(rs1) | (1 << index) +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbs (<<#zbs>>) +|0.93 +|Frozen +|=== + <<< -include::insns/clmul.adoc[] +[#insns-clmul,reftext="Carry-less multiply (low-part)"] +==== clmul + +Synopsis:: +Carry-less multiply (low-part) + +Mnemonic:: +clmul _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['CLMUL'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x5, attr: ['MINMAX/CLMUL'] }, +]} +.... + +Description:: +clmul produces the lower half of the 2·XLEN carry-less product. + +Operation:: +[source,sail] +-- +let rs1_val = X(rs1); +let rs2_val = X(rs2); +let output : xlenbits = 0; + +foreach (i from 0 to (xlen - 1) by 1) { + output = if ((rs2_val >> i) & 1) + then output ^ (rs1_val << i); + else output; +} + +X[rd] = output +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbc (<<#zbc>>) +|0.93 +|Frozen + +|Zbkc (<<#zbkc>>) +|v0.9.4 +|Frozen +|=== + <<< -include::insns/clmulh.adoc[] +[#insns-clmulh,reftext="Carry-less multiply (high-part)"] +==== clmulh + +Synopsis:: +Carry-less multiply (high-part) + +Mnemonic:: +clmulh _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x3, attr: ['CLMULH'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x5, attr: ['MINMAX/CLMUL'] }, +]} +.... + +Description:: +clmulh produces the upper half of the 2·XLEN carry-less product. + +Operation:: +[source,sail] +-- +let rs1_val = X(rs1); +let rs2_val = X(rs2); +let output : xlenbits = 0; + +foreach (i from 1 to xlen by 1) { + output = if ((rs2_val >> i) & 1) + then output ^ (rs1_val >> (xlen - i)); + else output; +} + +X[rd] = output +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbc (<<#zbc>>) +|0.93 +|Frozen + +|Zbkc (<<#zbkc>>) +|v0.9.4 +|Frozen +|=== + + <<< -include::insns/clmulr.adoc[] +[#insns-clmulr,reftext="Carry-less multiply (reversed)"] +==== clmulr + +Synopsis:: +Carry-less multiply (reversed) + +Mnemonic:: +clmulr _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x2, attr: ['CLMULR'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x5, attr: ['MINMAX/CLMUL'] }, +]} +.... + +Description:: +*clmulr* produces bits 2·XLEN−2:XLEN-1 of the 2·XLEN carry-less +product. + +Operation:: +[source,sail] +-- +let rs1_val = X(rs1); +let rs2_val = X(rs2); +let output : xlenbits = 0; + +foreach (i from 0 to (xlen - 1) by 1) { + output = if ((rs2_val >> i) & 1) + then output ^ (rs1_val >> (xlen - i - 1)); + else output; +} + +X[rd] = output +-- + +.Note +[NOTE, caption="A" ] +=============================================================== +The *clmulr* instruction is used to accelerate CRC calculations. +The *r* in the instruction's mnemonic stands for _reversed_, as the +instruction is equivalent to bit-reversing the inputs, performing +a *clmul*, then bit-reversing the output. +=============================================================== + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbc (<<#zbc>>) +|0.93 +|Frozen +|=== + <<< -include::insns/clz.adoc[] +[#insns-clz,reftext="Count leading zero bits"] +==== clz + +Synopsis:: +Count leading zero bits + +Mnemonic:: +clz _rd_, _rs_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['CLZ'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 0x0, attr: ['CLZ'] }, + { bits: 7, name: 0x30, attr: ['CLZ'] }, +]} +.... + +Description:: +This instruction counts the number of 0's before the first 1, starting at the most-significant bit (i.e., XLEN-1) and progressing to bit 0. Accordingly, if the input is 0, the output is XLEN, and if the most-significant bit of the input is a 1, the output is 0. + +Operation:: +[source,sail] +-- +val HighestSetBit : forall ('N : Int), 'N >= 0. bits('N) -> int + +function HighestSetBit x = { + foreach (i from (xlen - 1) to 0 by 1 in dec) + if [x[i]] == 0b1 then return(i) else (); + return -1; +} + +let rs = X(rs); +X[rd] = (xlen - 1) - HighestSetBit(rs); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen +|=== + <<< -include::insns/clzw.adoc[] +[#insns-clzw,reftext="Count leading zero bits in word"] +==== clzw + +Synopsis:: +Count leading zero bits in word + +Mnemonic:: +clzw _rd_, _rs_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x1b, attr: ['OP-IMM-32'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['CLZW'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 0x0, attr: ['CLZW'] }, + { bits: 7, name: 0x30, attr: ['CLZW'] }, +]} +.... + +Description:: +This instruction counts the number of 0's before the first 1 starting at bit 31 and progressing to bit 0. +Accordingly, if the least-significant word is 0, the output is 32, and if the most-significant bit of the word (i.e., bit 31) is a 1, the output is 0. + +Operation:: +[source,sail] +-- +val HighestSetBit32 : forall ('N : Int), 'N >= 0. bits('N) -> int + +function HighestSetBit32 x = { + foreach (i from 31 to 0 by 1 in dec) + if [x[i]] == 0b1 then return(i) else (); + return -1; +} + +let rs = X(rs); +X[rd] = 31 - HighestSetBit(rs); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen +|=== + <<< -include::insns/cpop.adoc[] +[#insns-cpop,reftext="Count set bits"] +==== cpop + +Synopsis:: +Count set bits + +Mnemonic:: +cpop _rd_, _rs_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['CPOP'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 0x2, attr: ['CPOP'] }, + { bits: 7, name: 0x30, attr: ['CPOP'] }, +]} +.... +Description:: +This instructions counts the number of 1's (i.e., set bits) in the source register. + +Operation:: +[source,sail] +-- +let bitcount = 0; +let rs = X(rs); + +foreach (i from 0 to (xlen - 1) in inc) + if rs[i] == 0b1 then bitcount = bitcount + 1 else (); + +X[rd] = bitcount +-- + +.Software Hint +[NOTE, caption="SH" ] +=============================================================== +This operations is known as population count, popcount, sideways sum, bit summation, or Hamming weight. + +The GCC builtin function `+__builtin_popcount (unsigned int x)+` is implemented by cpop on RV32 and by *cpopw* on RV64. +The GCC builtin function `+__builtin_popcountl (unsigned long x)+` for LP64 is implemented by *cpop* on RV64. +=============================================================== + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen +|=== + <<< -include::insns/cpopw.adoc[] +[#insns-cpopw,reftext="Count set bits in word"] +==== cpopw + +Synopsis:: +Count set bits in word + +Mnemonic:: +cpopw _rd_, _rs_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x1b, attr: ['OP-IMM-32'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['CPOPW'] }, + { bits: 5, name: 'rs' }, + { bits: 5, name: 0x2, attr: ['CPOPW'] }, + { bits: 7, name: 0x30, attr: ['CPOPW'] }, +]} +.... +Description:: +This instructions counts the number of 1's (i.e., set bits) in the least-significant word of the source register. + +Operation:: +[source,sail] +-- +let bitcount = 0; +let val = X(rs); + +foreach (i from 0 to 31 in inc) + if val[i] == 0b1 then bitcount = bitcount + 1 else (); + +X[rd] = bitcount +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen +|=== + <<< -include::insns/ctz.adoc[] +[#insns-ctz,reftext="Count trailing zero bits"] +==== ctz + +Synopsis:: +Count trailing zeros + +Mnemonic:: +ctz _rd_, _rs_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['CTZ/CTZW'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 0x1, attr: ['CTZ/CTZW'] }, + { bits: 7, name: 0x30, attr: ['CTZ/CTZW'] }, +]} +.... + +Description:: +This instruction counts the number of 0's before the first 1, starting at the least-significant bit (i.e., 0) and progressing to the most-significant bit (i.e., XLEN-1). +Accordingly, if the input is 0, the output is XLEN, and if the least-significant bit of the input is a 1, the output is 0. + +Operation:: +[source,sail] +-- +val LowestSetBit : forall ('N : Int), 'N >= 0. bits('N) -> int + +function LowestSetBit x = { + foreach (i from 0 to (xlen - 1) by 1 in dec) + if [x[i]] == 0b1 then return(i) else (); + return xlen; +} + +let rs = X(rs); +X[rd] = LowestSetBit(rs); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen +|=== + <<< -include::insns/ctzw.adoc[] +[#insns-ctzw,reftext="Count trailing zero bits in word"] +==== ctzw + +Synopsis:: +Count trailing zero bits in word + +Mnemonic:: +ctzw _rd_, _rs_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x1b, attr: ['OP-IMM-32'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['CTZ/CTZW'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 0x1, attr: ['CTZ/CTZW'] }, + { bits: 7, name: 0x30, attr: ['CTZ/CTZW'] }, +]} +.... + +Description:: +This instruction counts the number of 0's before the first 1, starting at the least-significant bit (i.e., 0) and progressing to the most-significant bit of the least-significant word (i.e., 31). Accordingly, if the least-significant word is 0, the output is 32, and if the least-significant bit of the input is a 1, the output is 0. + +Operation:: +[source,sail] +-- +val LowestSetBit32 : forall ('N : Int), 'N >= 0. bits('N) -> int + +function LowestSetBit32 x = { + foreach (i from 0 to 31 by 1 in dec) + if [x[i]] == 0b1 then return(i) else (); + return 32; +} + +let rs = X(rs); +X[rd] = LowestSetBit32(rs); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen +|=== + <<< -include::insns/max.adoc[] +[#insns-max,reftext="Maximum"] +==== max + +Synopsis:: +Maximum + +Mnemonic:: +max _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x6, attr: ['MAX']}, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x05, attr: ['MINMAX/CLMUL'] }, +]} +.... + +Description:: +This instruction returns the larger of two signed integers. + +Operation:: +[source,sail] +-- +let rs1_val = X(rs1); +let rs2_val = X(rs2); + +let result = if rs1_val <_s rs2_val + then rs2_val + else rs1_val; + +X(rd) = result; +-- + +.Software Hint +[NOTE, caption="SW"] +=============================================================== +Calculating the absolute value of a signed integer can be performed +using the following sequence: *neg rD,rS* followed by *max +rD,rS,rD*. When using this common sequence, it is suggested that they +are scheduled with no intervening instructions so that +implementations that are so optimized can fuse them together. +=============================================================== + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen +|=== + <<< -include::insns/maxu.adoc[] +[#insns-maxu,reftext="Unsigned maximum"] +==== maxu + +Synopsis:: +Unsigned maximum + +Mnemonic:: +maxu _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x7, attr: ['MAXU']}, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x05, attr: ['MINMAX/CLMUL'] }, +]} +.... + +Description:: +This instruction returns the larger of two unsigned integers. + +Operation:: +[source,sail] +-- +let rs1_val = X(rs1); +let rs2_val = X(rs2); + +let result = if rs1_val <_u rs2_val + then rs2_val + else rs1_val; + +X(rd) = result; +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen +|=== + <<< -include::insns/min.adoc[] +[#insns-min,reftext="Minimum"] +==== min + +Synopsis:: +Minimum + +Mnemonic:: +min _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x4, attr: ['MIN']}, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x05, attr: ['MINMAX/CLMUL'] }, +]} +.... + +Description:: +This instruction returns the smaller of two signed integers. + +Operation:: +[source,sail] +-- +let rs1_val = X(rs1); +let rs2_val = X(rs2); + +let result = if rs1_val <_s rs2_val + then rs1_val + else rs2_val; + +X(rd) = result; +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen +|=== + <<< -include::insns/minu.adoc[] +[#insns-minu,reftext="Unsigned minimum"] +==== minu + +Synopsis:: +Unsigned minimum + +Mnemonic:: +minu _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x5, attr: ['MINU']}, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x05, attr: ['MINMAX/CLMUL'] }, +]} +.... + +Description:: +This instruction returns the smaller of two unsigned integers. + +Operation:: +[source,sail] +-- +let rs1_val = X(rs1); +let rs2_val = X(rs2); + +let result = if rs1_val <_u rs2_val + then rs1_val + else rs2_val; + +X(rd) = result; +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen +|=== + <<< -include::insns/orc_b.adoc[] +[#insns-orc_b,reftext="Bitwise OR-Combine, byte granule"] +==== orc.b + +Synopsis:: +Bitwise OR-Combine, byte granule + +Mnemonic:: +orc.b _rd_, _rs_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x5 }, + { bits: 5, name: 'rs' }, + { bits: 12, name: 0x287 } +]} +.... + +Description:: +Combines the bits within each byte using bitwise logical OR. +This sets the bits of each byte in the result _rd_ to all zeros if no bit within the respective byte of _rs_ is set, or to all ones if any bit within the respective byte of _rs_ is set. + +Operation:: +[source,sail] +-- +let input = X(rs); +let output : xlenbits = 0; + +foreach (i from 0 to (xlen - 8) by 8) { + output[(i + 7)..i] = if input[(i + 7)..i] == 0 + then 0b00000000 + else 0b11111111; +} + +X[rd] = output; +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen +|=== + <<< -include::insns/orn.adoc[] +[#insns-orn,reftext="OR with inverted operand"] +==== orn + +Synopsis:: +OR with inverted operand + +Mnemonic:: +orn _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x6, attr: ['ORN']}, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x20, attr: ['ORN'] }, +]} +.... + +Description:: +This instruction performs the bitwise logical OR operation between _rs1_ and the bitwise inversion of _rs2_. + +Operation:: +[source,sail] +-- +X(rd) = X(rs1) | ~X(rs2); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen + +|Zbkb (<<#zbkb>>) +|v0.9.4 +|Frozen +|=== + <<< -include::insns/pack.adoc[] +[#insns-pack,reftext="Pack low halves of registers"] +==== pack + +Synopsis:: +Pack the low halves of _rs1_ and _rs2_ into _rd_. + +Mnemonic:: +pack _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + {bits: 7, name: 0x33, attr: ['OP'] }, + {bits: 5, name: 'rd'}, + {bits: 3, name: 0x4, attr:['PACK']}, + {bits: 5, name: 'rs1'}, + {bits: 5, name: 'rs2'}, + {bits: 7, name: 0x4, attr:['PACK']}, +]} +.... + +Description:: +The pack instruction packs the XLEN/2-bit lower halves of _rs1_ and _rs2_ into +_rd_, with _rs1_ in the lower half and _rs2_ in the upper half. + +Operation:: +[source,sail] +-- +let lo_half : bits(xlen/2) = X(rs1)[xlen/2-1..0]; +let hi_half : bits(xlen/2) = X(rs2)[xlen/2-1..0]; +X(rd) = EXTZ(hi_half @ lo_half); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbkb (<<#zbkb>>) +|v0.9.4 +|Frozen +|=== + <<< -include::insns/packh.adoc[] +[#insns-packh,reftext="Pack low bytes of registers"] +==== packh + +Synopsis:: +Pack the low bytes of _rs1_ and _rs2_ into _rd_. + +Mnemonic:: +packh _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + {bits: 7, name: 0x33, attr: ['OP'] }, + {bits: 5, name: 'rd'}, + {bits: 3, name: 0x7, attr: ['PACKH']}, + {bits: 5, name: 'rs1'}, + {bits: 5, name: 'rs2'}, + {bits: 7, name: 0x4, attr: ['PACKH']}, +]} +.... + +Description:: +And the packh instruction packs the least-significant bytes of +_rs1_ and _rs2_ into the 16 least-significant bits of _rd_, +zero extending the rest of _rd_. + +Operation:: +[source,sail] +-- +let lo_half : bits(8) = X(rs1)[7..0]; +let hi_half : bits(8) = X(rs2)[7..0]; +X(rd) = EXTZ(hi_half @ lo_half); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbkb (<<#zbkb>>) +|v0.9.4 +|Frozen +|=== + <<< -include::insns/packw.adoc[] +[#insns-packw,reftext="Pack low 16-bits of registers (RV64)"] +==== packw + +Synopsis:: +Pack the low 16-bits of _rs1_ and _rs2_ into _rd_ on RV64. + +Mnemonic:: +packw _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ +{bits: 2, name: 0x3}, +{bits: 5, name: 0xe}, +{bits: 5, name: 'rd'}, +{bits: 3, name: 0x4}, +{bits: 5, name: 'rs1'}, +{bits: 5, name: 'rs2'}, +{bits: 7, name: 0x4}, +]} +.... + +Description:: +This instruction packs the low 16 bits of +_rs1_ and _rs2_ into the 32 least-significant bits of _rd_, +sign extending the 32-bit result to the rest of _rd_. +This instruction only exists on RV64 based systems. + +Operation:: +[source,sail] +-- +let lo_half : bits(16) = X(rs1)[15..0]; +let hi_half : bits(16) = X(rs2)[15..0]; +X(rd) = EXTS(hi_half @ lo_half); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbkb (<<#zbkb>>) +|v0.9.4 +|Frozen +|=== + <<< -include::insns/rev8.adoc[] +[#insns-rev8,reftext="Byte-reverse register"] +==== rev8 + +Synopsis:: +Byte-reverse register + +Mnemonic:: +rev8 _rd_, _rs_ + +Encoding (RV32):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x5 }, + { bits: 5, name: 'rs' }, + { bits: 12, name: 0x698 } +]} +.... + +Encoding (RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x5 }, + { bits: 5, name: 'rs' }, + { bits: 12, name: 0x6b8 } +]} +.... + +Description:: +This instruction reverses the order of the bytes in _rs_. + +Operation:: +[source,sail] +-- +let input = X(rs); +let output : xlenbits = 0; +let j = xlen - 1; + +foreach (i from 0 to (xlen - 8) by 8) { + output[i..(i + 7)] = input[(j - 7)..j]; + j = j - 8; +} + +X[rd] = output +-- + +.Note +[NOTE, caption="A" ] +=============================================================== +The *rev8* mnemonic corresponds to different instruction encodings in RV32 and RV64. +=============================================================== + +.Software Hint +[NOTE, caption="SH" ] +=============================================================== +The byte-reverse operation is only available for the full register +width. To emulate word-sized and halfword-sized byte-reversal, +perform a `rev8 rd,rs` followed by a `srai rd,rd,K`, where K is +XLEN-32 and XLEN-16, respectively. +=============================================================== + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen + +|Zbkb (<<#zbkb>>) +|v0.9.4 +|Frozen +|=== + <<< -include::insns/revb.adoc[] +[#insns-revb,reftext="Reverse bits in bytes"] +==== rev.b + +Synopsis:: +Reverse the bits in each byte of a source register. + +Mnemonic:: +rev.b _rd_, _rs_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x5 }, + { bits: 5, name: 'rs' }, + { bits: 12, name: 0x687 } +]} +.... + +Description:: +This instruction reverses the order of the bits in every byte of a register. + +Operation:: +[source,sail] +-- +result : xlenbits = EXTZ(0b0); +foreach (i from 0 to sizeof(xlen) by 8) { + result[i+7..i] = reverse_bits_in_byte(X(rs1)[i+7..i]); +}; +X(rd) = result; +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbkb (<<#zbkb>>) +|v0.9.4 +|Frozen +|=== + <<< -include::insns/rol.adoc[] +[#insns-rol,reftext="Rotate left (Register)"] +==== rol + +Synopsis:: +Rotate Left (Register) + +Mnemonic:: +rol _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['ROL']}, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x30, attr: ['ROL'] }, +]} +.... + +Description:: +This instruction performs a rotate left of _rs1_ by the amount in least-significant log2(XLEN) bits of _rs2_. + +Operation:: +[source,sail] +-- +let shamt = if xlen == 32 + then X(rs2)[4..0] + else X(rs2)[5..0]; +let result = (X(rs1) << shamt) | (X(rs1) >> (xlen - shamt)); + +X(rd) = result; +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen + +|Zbkb (<<#zbkb>>) +|v0.9.4 +|Frozen +|=== + <<< -include::insns/rolw.adoc[] +[#insns-rolw,reftext="Rotate Left Word (Register)"] +==== rolw + +Synopsis:: +Rotate Left Word (Register) + +Mnemonic:: +rolw _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x3b, attr: ['OP-32'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['ROLW']}, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x30, attr: ['ROLW'] }, +]} +.... + +Description:: +This instruction performs a rotate left on the least-significant word of _rs1_ by the amount in least-significant 5 bits of _rs2_. +The resulting word value is sign-extended by copying bit 31 to all of the more-significant bits. + +Operation:: +[source,sail] +-- +let rs1 = EXTZ(X(rs1)[31..0]) +let shamt = X(rs2)[4..0]; +let result = (rs1 << shamt) | (rs1 >> (32 - shamt)); +X(rd) = EXTS(result[31..0]); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen + +|Zbkb (<<#zbkb>>) +|v0.9.4 +|Frozen +|=== + <<< -include::insns/ror.adoc[] +[#insns-ror,reftext="Rotate right (Register)"] +==== ror + +Synopsis:: +Rotate Right + +Mnemonic:: +ror _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x5, attr: ['ROR']}, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x30, attr: ['ROR'] }, +]} +.... + +Description:: +This instruction performs a rotate right of _rs1_ by the amount in least-significant log2(XLEN) bits of _rs2_. + +Operation:: +[source,sail] +-- +let shamt = if xlen == 32 + then X(rs2)[4..0] + else X(rs2)[5..0]; +let result = (X(rs1) >> shamt) | (X(rs1) << (xlen - shamt)); + +X(rd) = result; +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen + +|Zbkb (<<#zbkb>>) +|v0.9.4 +|Frozen +|=== + <<< -include::insns/rori.adoc[] +[#insns-rori,reftext="Rotate right (Immediate)"] +==== rori + +Synopsis:: +Rotate Right (Immediate) + +Mnemonic:: +rori _rd_, _rs1_, _shamt_ + +Encoding (RV32):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x5, attr: ['RORI']}, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'shamt' }, + { bits: 7, name: 0x30, attr: ['RORI'] }, +]} +.... + +Encoding (RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x5, attr: ['RORI']}, + { bits: 5, name: 'rs1' }, + { bits: 6, name: 'shamt' }, + { bits: 6, name: 0x18, attr: ['RORI'] }, +]} +.... + +Description:: +This instruction performs a rotate right of _rs1_ by the amount in the least-significant log2(XLEN) bits of _shamt_. +For RV32, the encodings corresponding to shamt[5]=1 are reserved. + +Operation:: +[source,sail] +-- +let shamt = if xlen == 32 + then shamt[4..0] + else shamt[5..0]; +let result = (X(rs1) >> shamt) | (X(rs1) << (xlen - shamt)); + +X(rd) = result; +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen + +|Zbkb (<<#zbkb>>) +|v0.9.4 +|Frozen +|=== + <<< -include::insns/roriw.adoc[] +[#insns-roriw,reftext="Rotate right Word (Immediate)"] +==== roriw + +Synopsis:: +Rotate Right Word by Immediate + +Mnemonic:: +roriw _rd_, _rs1_, _shamt_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x1b, attr: ['OP-IMM-32'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x5, attr: ['RORIW']}, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'shamt' }, + { bits: 7, name: 0x30, attr: ['RORIW'] }, +]} +.... + +Description:: +This instruction performs a rotate right on the least-significant word +of _rs1_ by the amount in the least-significant log2(XLEN) bits of +_shamt_. +The resulting word value is sign-extended by copying bit 31 to all of +the more-significant bits. + + +Operation:: +[source,sail] +-- +let rs1_data = EXTZ(X(rs1)[31..0]; +let result = (rs1_data >> shamt) | (rs1_data << (32 - shamt)); +X(rd) = EXTS(result[31..0]); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen + +|Zbkb (<<#zbkb>>) +|v0.9.4 +|Frozen +|=== + <<< -include::insns/rorw.adoc[] +[#insns-rorw,reftext="Rotate right Word (Register)"] +==== rorw + +Synopsis:: +Rotate Right Word (Register) + +Mnemonic:: +rorw _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x3b, attr: ['OP-32'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x5, attr: ['RORW']}, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x30, attr: ['RORW'] }, +]} +.... + +Description:: +This instruction performs a rotate right on the least-significant word of _rs1_ by the amount in least-significant 5 bits of _rs2_. +The resultant word is sign-extended by copying bit 31 to all of the more-significant bits. + +Operation:: +[source,sail] +-- +let rs1 = EXTZ(X(rs1)[31..0]) +let shamt = X(rs2)[4..0]; +let result = (rs1 >> shamt) | (rs1 << (32 - shamt)); +X(rd) = EXTS(result); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen + +|Zbkb (<<#zbkb>>) +|v0.9.4 +|Frozen +|=== + <<< -include::insns/sext_b.adoc[] +[#insns-sext_b,reftext="Sign-extend byte"] +==== sext.b + +Synopsis:: +Sign-extend byte + +Mnemonic:: +sext.b _rd_, _rs_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['SEXT.B/SEXT.H'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 0x04, attr: ['SEXT.B'] }, + { bits: 7, name: 0x30 }, +]} +.... + +Description:: +This instruction sign-extends the least-significant byte in the source to XLEN by copying the most-significant bit in the byte (i.e., bit 7) to all of the more-significant bits. + +Operation:: +[source,sail] +-- +X(rd) = EXTS(X(rs)[7..0]); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen +|=== + <<< -include::insns/sext_h.adoc[] +[#insns-sext_h,reftext="Sign-extend halfword"] +==== sext.h + +Synopsis:: +Sign-extend halfword + +Mnemonic:: +sext.h _rd_, _rs_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['SEXT.B/SEXT.H'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 0x05, attr: ['SEXT.H'] }, + { bits: 7, name: 0x30 }, +]} +.... + +Description:: +This instruction sign-extends the least-significant halfword in _rs_ to XLEN by copying the most-significant bit in the halfword (i.e., bit 15) to all of the more-significant bits. + +Operation:: +[source,sail] +-- +X(rd) = EXTS(X(rs)[15..0]); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen +|=== + + <<< -include::insns/sh1add.adoc[] +[#insns-sh1add,reftext=Shift left by 1 and add] +==== sh1add + +Synopsis:: +Shift left by 1 and add + +Mnemonic:: +sh1add _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x2, attr: ['SH1ADD'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x10, attr: ['SH1ADD'] }, +]} +.... + +Description:: +This instruction shifts _rs1_ to the left by 1 bit and adds it to _rs2_. + +Operation:: +[source,sail] +-- +X(rd) = X(rs2) + (X(rs1) << 1); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zba (<<#zba>>) +|0.93 +|Frozen +|=== + +// We have decided that this and all other instructions will not have reserved encodings for "useless encodings" +// We could follow suit of the base ISA and create HINTs if there is some recognized value for doing so + <<< -include::insns/sh1add_uw.adoc[] +[#insns-sh1add_uw,reftext=Shift unsigned word left by 1 and add] +==== sh1add.uw + +Synopsis:: +Shift unsigned word left by 1 and add + +Mnemonic:: +sh1add.uw _rd_, _rs1_, _rs2_ +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x3b, attr: ['OP-32'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x2, attr: ['SH1ADD.UW'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x10, attr: ['SH1ADD.UW'] }, +]} +.... + +Description:: +This instruction performs an XLEN-wide addition of two addends. +The first addend is _rs2_. The second addend is the unsigned value formed by extracting the least-significant word of _rs1_ and shifting it left by 1 place. + +Operation:: +[source,sail] +-- +let base = X(rs2); +let index = EXTZ(X(rs1)[31..0]); + +X(rd) = base + (index << 1); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zba (<<#zba>>) +|0.93 +|Frozen +|=== + <<< -include::insns/sh2add.adoc[] +[#insns-sh2add,reftext=Shift left by 2 and add] +==== sh2add + +Synopsis:: +Shift left by 2 and add + +Mnemonic:: +sh2add _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x4, attr: ['SH2ADD'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x10, attr: ['SH2ADD'] }, +]} +.... + +Description:: +This instruction shifts _rs1_ to the left by 2 places and adds it to _rs2_. + +Operation:: +[source,sail] +-- +X(rd) = X(rs2) + (X(rs1) << 2); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zba (<<#zba>>) +|0.93 +|Frozen +|=== + <<< -include::insns/sh2add_uw.adoc[] +[#insns-sh2add_uw,reftext=Shift unsigned word left by 2 and add] +==== sh2add.uw + +Synopsis:: +Shift unsigned word left by 2 and add + +Mnemonic:: +sh2add.uw _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x3b, attr: ['OP-32'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x4, attr: ['SH2ADD.UW'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x10, attr: ['SH2ADD.UW'] }, +]} +.... + +Description:: +This instruction performs an XLEN-wide addition of two addends. +The first addend is _rs2_. +The second addend is the unsigned value formed by extracting the least-significant word of _rs1_ and shifting it left by 2 places. + +Operation:: +[source,sail] +-- +let base = X(rs2); +let index = EXTZ(X(rs1)[31..0]); + +X(rd) = base + (index << 2); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zba (<<#zba>>) +|0.93 +|Frozen +|=== + <<< -include::insns/sh3add.adoc[] +[#insns-sh3add,reftext=Shift left by 3 and add] +==== sh3add + +Synopsis:: +Shift left by 3 and add + +Mnemonic:: +sh3add _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x6, attr: ['SH3ADD'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x10, attr: ['SH3ADD'] }, +]} +.... + +Description:: +This instruction shifts _rs1_ to the left by 3 places and adds it to _rs2_. + +Operation:: +[source,sail] +-- +X(rd) = X(rs2) + (X(rs1) << 3); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zba (<<#zba>>) +|0.93 +|Frozen +|=== + <<< -include::insns/sh3add_uw.adoc[] +[#insns-sh3add_uw,reftext=Shift unsigned word left by 3 and add] +==== sh3add.uw + +Synopsis:: +Shift unsigned word left by 3 and add + +Mnemonic:: +sh3add.uw _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x3b, attr: ['OP-32'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x6, attr: ['SH3ADD.UW'] }, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x10, attr: ['SH3ADD.UW'] }, +]} +.... + +Description:: +This instruction performs an XLEN-wide addition of two addends. The first addend is _rs2_. The second addend is the unsigned value formed by extracting the least-significant word of _rs1_ and shifting it left by 3 places. + +Operation:: +[source,sail] +-- +let base = X(rs2); +let index = EXTZ(X(rs1)[31..0]); + +X(rd) = base + (index << 3); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zba (<<#zba>>) +|0.93 +|Frozen +|=== + <<< -include::insns/slli_uw.adoc[] +[#insns-slli_uw,reftext="Shift-left unsigned word (Immediate)"] +==== slli.uw + +Synopsis:: +Shift-left unsigned word (Immediate) + +Mnemonic:: +slli.uw _rd_, _rs1_, _shamt_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x1b, attr: ['OP-IMM-32'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x1, attr: ['SLLI.UW'] }, + { bits: 5, name: 'rs1' }, + { bits: 6, name: 'shamt' }, + { bits: 6, name: 0x02, attr: ['SLLI.UW'] }, +]} +.... + +Description:: +This instruction takes the least-significant word of _rs1_, zero-extends it, and shifts it left by the immediate. + +Operation:: +[source,sail] +-- +X(rd) = (EXTZ(X(rs)[31..0]) << shamt); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zba (<<#zba>>) +|0.93 +|Frozen +|=== + +.Architecture Explanation +[NOTE, caption="A" ] +=============================================================== +This instruction is the same as *slli* with *zext.w* performed on _rs1_ before shifting. +=============================================================== + <<< -include::insns/unzip.adoc[] +[#insns-unzip,reftext="Bit deinterleave"] +==== unzip + +Synopsis:: +Implements the inverse of the zip instruction. + +Mnemonic:: +unzip _rd_, _rs_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ +{bits: 2, name: 0x3}, +{bits: 5, name: 0x4}, +{bits: 5, name: 'rd'}, +{bits: 3, name: 0x5}, +{bits: 5, name: 'rs1'}, +{bits: 5, name: 0x1f}, +{bits: 7, name: 0x4}, +]} +.... + +Description:: +This instruction gathers bits from the high and low halves of the source +word into odd/even bit positions in the destination word. +It is the inverse of the <> instruction. +This instruction is available only on RV32. + +Operation:: +[source,sail] +-- +foreach (i from 0 to xlen/2-1) { + X(rd)[i] = X(rs1)[2*i] + X(rd)[i+xlen/2] = X(rs1)[2*i+1] +} +-- + +.Software Hint +[NOTE, caption="SH" ] +=============================================================== +This instruction is useful for implementing the SHA3 cryptographic +hash function on a 32-bit architecture, as it implements the +bit-interleaving operation used to speed up the 64-bit rotations +directly. +=============================================================== + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbkb (<<#zbkb>>) (RV32) +|v0.9.4 +|Frozen +|=== + <<< -include::insns/xnor.adoc[] +[#insns-xnor,reftext="Exclusive NOR"] +==== xnor + +Synopsis:: +Exclusive NOR + +Mnemonic:: +xnor _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x4, attr: ['XNOR']}, + { bits: 5, name: 'rs1' }, + { bits: 5, name: 'rs2' }, + { bits: 7, name: 0x20, attr: ['XNOR'] }, +]} +.... + +Description:: +This instruction performs the bit-wise exclusive-NOR operation on _rs1_ and _rs2_. + +Operation:: +[source,sail] +-- +X(rd) = ~(X(rs1) ^ X(rs2)); +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen + +|Zbkb (<<#zbkb>>) +|v0.9.4 +|Frozen +|=== + <<< -include::insns/xpermb.adoc[] +[#insns-xpermb,reftext="Crossbar permutation (bytes)"] +==== xperm.b + +Synopsis:: +Byte-wise lookup of indices into a vector in registers. + +Mnemonic:: +xperm.b _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ +{bits: 2, name: 0x3}, +{bits: 5, name: 0xc}, +{bits: 5, name: 'rd'}, +{bits: 3, name: 0x4}, +{bits: 5, name: 'rs1'}, +{bits: 5, name: 'rs2'}, +{bits: 7, name: 0x14}, +]} +.... + +Description:: +The xperm.b instruction operates on bytes. +The _rs1_ register contains a vector of XLEN/8 8-bit elements. +The _rs2_ register contains a vector of XLEN/8 8-bit indexes. +The result is each element in _rs2_ replaced by the indexed element in _rs1_, +or zero if the index into _rs2_ is out of bounds. + +Operation:: +[source,sail] +-- +val xpermb_lookup : (bits(8), xlenbits) -> bits(8) +function xpermb_lookup (idx, lut) = { + (lut >> (idx @ 0b000))[7..0] +} + +function clause execute ( XPERM_B (rs2,rs1,rd)) = { + result : xlenbits = EXTZ(0b0); + foreach(i from 0 to xlen by 8) { + result[i+7..i] = xpermn_lookup(X(rs2)[i+7..i], X(rs1)); + }; + X(rd) = result; + RETIRE_SUCCESS +} +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbkx (<<#zbkx>>) +|v0.9.4 +|Frozen +|=== + <<< -include::insns/xpermn.adoc[] +[#insns-xpermn,reftext="Crossbar permutation (nibbles)"] +==== xperm.n + +Synopsis:: +Nibble-wise lookup of indices into a vector. + +Mnemonic:: +xperm.n _rd_, _rs1_, _rs2_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ +{bits: 2, name: 0x3}, +{bits: 5, name: 0xc}, +{bits: 5, name: 'rd'}, +{bits: 3, name: 0x2}, +{bits: 5, name: 'rs1'}, +{bits: 5, name: 'rs2'}, +{bits: 7, name: 0x14}, +]} +.... + +Description:: +The xperm.n instruction operates on nibbles. +The _rs1_ register contains a vector of XLEN/4 4-bit elements. +The _rs2_ register contains a vector of XLEN/4 4-bit indexes. +The result is each element in _rs2_ replaced by the indexed element in _rs1_, +or zero if the index into _rs2_ is out of bounds. + +Operation:: +[source,sail] +-- +val xpermn_lookup : (bits(4), xlenbits) -> bits(4) +function xpermn_lookup (idx, lut) = { + (lut >> (idx @ 0b00))[3..0] +} + +function clause execute ( XPERM_N (rs2,rs1,rd)) = { + result : xlenbits = EXTZ(0b0); + foreach(i from 0 to xlen by 4) { + result[i+3..i] = xpermn_lookup(X(rs2)[i+3..i], X(rs1)); + }; + X(rd) = result; + RETIRE_SUCCESS +} +-- + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbkx (<<#zbkx>>) +|v0.9.4 +|Frozen +|=== + <<< -include::insns/zext_h.adoc[] +[#insns-zext_h,reftext="Zero-extend halfword"] +==== zext.h + +Synopsis:: +Zero-extend halfword + +Mnemonic:: +zext.h _rd_, _rs_ + +Encoding (RV32):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x33, attr: ['OP'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x4, attr: ['ZEXT.H']}, + { bits: 5, name: 'rs' }, + { bits: 5, name: 0x00 }, + { bits: 7, name: 0x04 }, +]} +.... + +Encoding (RV64):: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x3b, attr: ['OP-32'] }, + { bits: 5, name: 'rd' }, + { bits: 3, name: 0x4, attr: ['ZEXT.H']}, + { bits: 5, name: 'rs' }, + { bits: 5, name: 0x00 }, + { bits: 7, name: 0x04 }, +]} +.... + +Description:: +This instruction zero-extends the least-significant halfword of the source to XLEN by inserting 0's into all of the bits more significant than 15. + +Operation:: +[source,sail] +-- +X(rd) = EXTZ(X(rs)[15..0]); +-- + +.Note +[NOTE, caption="A" ] +=============================================================== +The *zext.h* mnemonic corresponds to different instruction encodings in RV32 and RV64. +=============================================================== + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbb (<<#zbb>>) +|0.93 +|Frozen +|=== + <<< -include::insns/zip.adoc[] +[#insns-zip,reftext="Bit interleave"] +==== zip + +Synopsis:: +Gather odd and even bits of the source word into upper/lower halves of the +destination. + +Mnemonic:: +zip _rd_, _rs_ + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ +{bits: 2, name: 0x3}, +{bits: 5, name: 0x4}, +{bits: 5, name: 'rd'}, +{bits: 3, name: 0x1}, +{bits: 5, name: 'rs1'}, +{bits: 5, name: 0x1e}, +{bits: 7, name: 0x4}, +]} +.... + +Description:: +This instruction scatters all of the odd and even bits of a source word into +the high and low halves of a destination word. +It is the inverse of the <> instruction. +This instruction is available only on RV32. + +Operation:: +[source,sail] +-- +foreach (i from 0 to xlen/2-1) { + X(rd)[2*i] = X(rs1)[i] + X(rd)[2*i+1] = X(rs1)[i+xlen/2] +} +-- + +.Software Hint +[NOTE, caption="SH" ] +=============================================================== +This instruction is useful for implementing the SHA3 cryptographic +hash function on a 32-bit architecture, as it implements the +bit-interleaving operation used to speed up the 64-bit rotations +directly. +=============================================================== + +Included in:: +[%header,cols="4,2,2"] +|=== +|Extension +|Minimum version +|Lifecycle state + +|Zbkb (<<#zbkb>>) (RV32) +|v0.9.4 +|Frozen +|=== + === Software optimization guide diff --git a/src/insns/add_uw.adoc b/src/insns/add_uw.adoc deleted file mode 100644 index a8b9588..0000000 --- a/src/insns/add_uw.adoc +++ /dev/null @@ -1,50 +0,0 @@ -[#insns-add_uw,reftext=Add unsigned word] -==== add.uw - -Synopsis:: -Add unsigned word - -Mnemonic:: -add.uw _rd_, _rs1_, _rs2_ - - -Pseudoinstructions:: -zext.w _rd_, _rs1_ → add.uw _rd_, _rs1_, zero - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x3b, attr: ['OP-32'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x0, attr: ['ADD.UW'] }, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'rs2' }, - { bits: 7, name: 0x04, attr: ['ADD.UW'] }, -]} -.... - -Description:: -This instruction performs an XLEN-wide addition between _rs2_ and the zero-extended least-significant word of _rs1_. - -Operation:: -[source,sail] --- -let base = X(rs2); -let index = EXTZ(X(rs1)[31..0]); - -X(rd) = base + index; --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zba (<<#zba>>) -|0.93 -|Frozen -|=== - diff --git a/src/insns/andn.adoc b/src/insns/andn.adoc deleted file mode 100644 index e0551b2..0000000 --- a/src/insns/andn.adoc +++ /dev/null @@ -1,47 +0,0 @@ -[#insns-andn,reftext="AND with inverted operand"] -==== andn - -Synopsis:: -AND with inverted operand - -Mnemonic:: -andn _rd_, _rs1_, _rs2_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x33, attr: ['OP'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x7, attr: ['ANDN']}, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'rs2' }, - { bits: 7, name: 0x20, attr: ['ANDN'] }, -]} -.... - -Description:: -This instruction performs the bitwise logical AND operation between _rs1_ and the bitwise inversion of _rs2_. - -Operation:: -[source,sail] --- -X(rd) = X(rs1) & ~X(rs2); --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbb (<<#zbb>>) -|0.93 -|Frozen - -|Zbkb (<<#zbkb>>) -|v0.9.4 -|Frozen -|=== - diff --git a/src/insns/bclr.adoc b/src/insns/bclr.adoc deleted file mode 100644 index a5095fe..0000000 --- a/src/insns/bclr.adoc +++ /dev/null @@ -1,45 +0,0 @@ -[#insns-bclr,reftext="Single-Bit Clear (Register)"] -==== bclr - -Synopsis:: -Single-Bit Clear (Register) - -Mnemonic:: -bclr _rd_, _rs1_, _rs2_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x33, attr: ['OP'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x1, attr: ['BCLR'] }, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'rs2' }, - { bits: 7, name: 0x24, attr: ['BCLR/BEXT'] }, -]} -.... - -Description:: -This instruction returns _rs1_ with a single bit cleared at the index specified in _rs2_. -The index is read from the lower log2(XLEN) bits of _rs2_. - -Operation:: -[source,sail] --- -let index = X(rs2) & (XLEN - 1); -X(rd) = X(rs1) & ~(1 << index) --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbs (<<#zbs>>) -|0.93 -|Frozen -|=== - diff --git a/src/insns/bclri.adoc b/src/insns/bclri.adoc deleted file mode 100644 index bafc115..0000000 --- a/src/insns/bclri.adoc +++ /dev/null @@ -1,59 +0,0 @@ -[#insns-bclri,reftext="Single-Bit Clear (Immediate)"] -==== bclri - -Synopsis:: -Single-Bit Clear (Immediate) - -Mnemonic:: -bclri _rd_, _rs1_, _shamt_ - -Encoding (RV32):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x13, attr: ['OP-IMM'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x1, attr: ['BCLRI'] }, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'shamt' }, - { bits: 7, name: 0x24, attr: ['BCLRI'] }, -]} -.... - -Encoding (RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x13, attr: ['OP-IMM'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x1, attr: ['BCLRI'] }, - { bits: 5, name: 'rs1' }, - { bits: 6, name: 'shamt' }, - { bits: 6, name: 0x12, attr: ['BCLRI'] }, -]} -.... - -Description:: -This instruction returns _rs1_ with a single bit cleared at the index specified in _shamt_. -The index is read from the lower log2(XLEN) bits of _shamt_. -For RV32, the encodings corresponding to shamt[5]=1 are reserved. - -Operation:: -[source,sail] --- -let index = shamt & (XLEN - 1); -X(rd) = X(rs1) & ~(1 << index) --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbs (<<#zbs>>) -|0.93 -|Frozen -|=== - diff --git a/src/insns/bext.adoc b/src/insns/bext.adoc deleted file mode 100644 index 22cd3fc..0000000 --- a/src/insns/bext.adoc +++ /dev/null @@ -1,46 +0,0 @@ -[#insns-bext,reftext="Single-Bit Extract (Register)"] -==== bext - -Synopsis:: -Single-Bit Extract (Register) -// Should we describe this as a Set-if-bit-is-set? - -Mnemonic:: -bext _rd_, _rs1_, _rs2_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x33, attr: ['OP'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x5, attr: ['BEXT'] }, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'rs2' }, - { bits: 7, name: 0x24, attr: ['BCLR/BEXT'] }, -]} -.... - -Description:: -This instruction returns a single bit extracted from _rs1_ at the index specified in _rs2_. -The index is read from the lower log2(XLEN) bits of _rs2_. - -Operation:: -[source,sail] --- -let index = X(rs2) & (XLEN - 1); -X(rd) = (X(rs1) >> index) & 1; --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbs (<<#zbs>>) -|0.93 -|Frozen -|=== - diff --git a/src/insns/bexti.adoc b/src/insns/bexti.adoc deleted file mode 100644 index 1f58ca7..0000000 --- a/src/insns/bexti.adoc +++ /dev/null @@ -1,59 +0,0 @@ -[#insns-bexti,reftext="Single-Bit Extract (Immediate)"] -==== bexti - -Synopsis:: -Single-Bit Extract (Immediate) - -Mnemonic:: -bexti _rd_, _rs1_, _shamt_ - -Encoding (RV32):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x13, attr: ['OP-IMM'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x5, attr: ['BEXTI'] }, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'shamt' }, - { bits: 7, name: 0x24, attr: ['BEXTI/BCLRI'] }, -]} -.... - -Encoding (RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x13, attr: ['OP-IMM'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x5, attr: ['BEXTI'] }, - { bits: 5, name: 'rs1' }, - { bits: 6, name: 'shamt' }, - { bits: 6, name: 0x12, attr: ['BEXTI/BCLRI'] }, -]} -.... - -Description:: -This instruction returns a single bit extracted from _rs1_ at the index specified in _rs2_. -The index is read from the lower log2(XLEN) bits of _shamt_. -For RV32, the encodings corresponding to shamt[5]=1 are reserved. - -Operation:: -[source,sail] --- -let index = shamt & (XLEN - 1); -X(rd) = (X(rs1) >> index) & 1; --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbs (<<#zbs>>) -|0.93 -|Frozen -|=== - diff --git a/src/insns/binv.adoc b/src/insns/binv.adoc deleted file mode 100644 index 04cc930..0000000 --- a/src/insns/binv.adoc +++ /dev/null @@ -1,45 +0,0 @@ -[#insns-binv,reftext="Single-Bit Invert (Register)"] -==== binv - -Synopsis:: -Single-Bit Invert (Register) - -Mnemonic:: -binv _rd_, _rs1_, _rs2_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x33, attr: ['OP'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x1, attr: ['BINV'] }, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'rs2' }, - { bits: 7, name: 0x34, attr: ['BINV'] }, -]} -.... - -Description:: -This instruction returns _rs1_ with a single bit inverted at the index specified in _rs2_. -The index is read from the lower log2(XLEN) bits of _rs2_. - -Operation:: -[source,sail] --- -let index = X(rs2) & (XLEN - 1); -X(rd) = X(rs1) ^ (1 << index) --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbs (<<#zbs>>) -|0.93 -|Frozen -|=== - diff --git a/src/insns/binvi.adoc b/src/insns/binvi.adoc deleted file mode 100644 index e7ec25e..0000000 --- a/src/insns/binvi.adoc +++ /dev/null @@ -1,59 +0,0 @@ -[#insns-binvi,reftext="Single-Bit Invert (Immediate)"] -==== binvi - -Synopsis:: -Single-Bit Invert (Immediate) - -Mnemonic:: -binvi _rd_, _rs1_, _shamt_ - -Encoding (RV32):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x13, attr: ['OP-IMM'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x1, attr: ['BINV'] }, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'shamt' }, - { bits: 7, name: 0x34, attr: ['BINVI'] }, -]} -.... - -Encoding (RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x13, attr: ['OP-IMM'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x1, attr: ['BINV'] }, - { bits: 5, name: 'rs1' }, - { bits: 6, name: 'shamt' }, - { bits: 6, name: 0x1a, attr: ['BINVI'] }, -]} -.... - -Description:: -This instruction returns _rs1_ with a single bit inverted at the index specified in _shamt_. -The index is read from the lower log2(XLEN) bits of _shamt_. -For RV32, the encodings corresponding to shamt[5]=1 are reserved. - -Operation:: -[source,sail] --- -let index = shamt & (XLEN - 1); -X(rd) = X(rs1) ^ (1 << index) --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbs (<<#zbs>>) -|0.93 -|Frozen -|=== - diff --git a/src/insns/bset.adoc b/src/insns/bset.adoc deleted file mode 100644 index e39fbde..0000000 --- a/src/insns/bset.adoc +++ /dev/null @@ -1,44 +0,0 @@ -[#insns-bset,reftext="Single-Bit Set (Register)"] -==== bset - -Synopsis:: -Single-Bit Set (Register) - -Mnemonic:: -bset _rd_, _rs1_,_rs2_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x33, attr: ['OP'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x1, attr: ['BSET'] }, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'rs2' }, - { bits: 7, name: 0x14, attr: ['BSET'] }, -]} -.... - -Description:: -This instruction returns _rs1_ with a single bit set at the index specified in _rs2_. -The index is read from the lower log2(XLEN) bits of _rs2_. - -Operation:: -[source,sail] --- -let index = X(rs2) & (XLEN - 1); -X(rd) = X(rs1) | (1 << index) --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbs (<<#zbs>>) -|0.93 -|Frozen -|=== diff --git a/src/insns/bseti.adoc b/src/insns/bseti.adoc deleted file mode 100644 index 9d80d98..0000000 --- a/src/insns/bseti.adoc +++ /dev/null @@ -1,59 +0,0 @@ -[#insns-bseti,reftext="Single-Bit Set (Immediate)"] -==== bseti - -Synopsis:: -Single-Bit Set (Immediate) - -Mnemonic:: -bseti _rd_, _rs1_,_shamt_ - -Encoding (RV32):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x13, attr: ['OP-IMM'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x1, attr: ['BSETI'] }, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'shamt' }, - { bits: 7, name: 0x14, attr: ['BSETI'] }, -]} -.... - -Encoding (RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x13, attr: ['OP-IMM'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x1, attr: ['BSETI'] }, - { bits: 5, name: 'rs1' }, - { bits: 6, name: 'shamt' }, - { bits: 6, name: 0x0a, attr: ['BSETI'] }, -]} -.... - -Description:: -This instruction returns _rs1_ with a single bit set at the index specified in _shamt_. -The index is read from the lower log2(XLEN) bits of _shamt_. -For RV32, the encodings corresponding to shamt[5]=1 are reserved. - -Operation:: -[source,sail] --- -let index = shamt & (XLEN - 1); -X(rd) = X(rs1) | (1 << index) --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbs (<<#zbs>>) -|0.93 -|Frozen -|=== - diff --git a/src/insns/clmul.adoc b/src/insns/clmul.adoc deleted file mode 100644 index b9976c9..0000000 --- a/src/insns/clmul.adoc +++ /dev/null @@ -1,57 +0,0 @@ -[#insns-clmul,reftext="Carry-less multiply (low-part)"] -==== clmul - -Synopsis:: -Carry-less multiply (low-part) - -Mnemonic:: -clmul _rd_, _rs1_, _rs2_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x33, attr: ['OP'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x1, attr: ['CLMUL'] }, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'rs2' }, - { bits: 7, name: 0x5, attr: ['MINMAX/CLMUL'] }, -]} -.... - -Description:: -clmul produces the lower half of the 2·XLEN carry-less product. - -Operation:: -[source,sail] --- -let rs1_val = X(rs1); -let rs2_val = X(rs2); -let output : xlenbits = 0; - -foreach (i from 0 to (xlen - 1) by 1) { - output = if ((rs2_val >> i) & 1) - then output ^ (rs1_val << i); - else output; -} - -X[rd] = output --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbc (<<#zbc>>) -|0.93 -|Frozen - -|Zbkc (<<#zbkc>>) -|v0.9.4 -|Frozen -|=== - diff --git a/src/insns/clmulh.adoc b/src/insns/clmulh.adoc deleted file mode 100644 index e4c6d88..0000000 --- a/src/insns/clmulh.adoc +++ /dev/null @@ -1,57 +0,0 @@ -[#insns-clmulh,reftext="Carry-less multiply (high-part)"] -==== clmulh - -Synopsis:: -Carry-less multiply (high-part) - -Mnemonic:: -clmulh _rd_, _rs1_, _rs2_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x33, attr: ['OP'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x3, attr: ['CLMULH'] }, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'rs2' }, - { bits: 7, name: 0x5, attr: ['MINMAX/CLMUL'] }, -]} -.... - -Description:: -clmulh produces the upper half of the 2·XLEN carry-less product. - -Operation:: -[source,sail] --- -let rs1_val = X(rs1); -let rs2_val = X(rs2); -let output : xlenbits = 0; - -foreach (i from 1 to xlen by 1) { - output = if ((rs2_val >> i) & 1) - then output ^ (rs1_val >> (xlen - i)); - else output; -} - -X[rd] = output --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbc (<<#zbc>>) -|0.93 -|Frozen - -|Zbkc (<<#zbkc>>) -|v0.9.4 -|Frozen -|=== - diff --git a/src/insns/clmulr.adoc b/src/insns/clmulr.adoc deleted file mode 100644 index 1db4ca7..0000000 --- a/src/insns/clmulr.adoc +++ /dev/null @@ -1,63 +0,0 @@ -[#insns-clmulr,reftext="Carry-less multiply (reversed)"] -==== clmulr - -Synopsis:: -Carry-less multiply (reversed) - -Mnemonic:: -clmulr _rd_, _rs1_, _rs2_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x33, attr: ['OP'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x2, attr: ['CLMULR'] }, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'rs2' }, - { bits: 7, name: 0x5, attr: ['MINMAX/CLMUL'] }, -]} -.... - -Description:: -*clmulr* produces bits 2·XLEN−2:XLEN-1 of the 2·XLEN carry-less -product. - -Operation:: -[source,sail] --- -let rs1_val = X(rs1); -let rs2_val = X(rs2); -let output : xlenbits = 0; - -foreach (i from 0 to (xlen - 1) by 1) { - output = if ((rs2_val >> i) & 1) - then output ^ (rs1_val >> (xlen - i - 1)); - else output; -} - -X[rd] = output --- - -.Note -[NOTE, caption="A" ] -=============================================================== -The *clmulr* instruction is used to accelerate CRC calculations. -The *r* in the instruction's mnemonic stands for _reversed_, as the -instruction is equivalent to bit-reversing the inputs, performing -a *clmul*, then bit-reversing the output. -=============================================================== - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbc (<<#zbc>>) -|0.93 -|Frozen -|=== - diff --git a/src/insns/clz.adoc b/src/insns/clz.adoc deleted file mode 100644 index 898d5f5..0000000 --- a/src/insns/clz.adoc +++ /dev/null @@ -1,52 +0,0 @@ -[#insns-clz,reftext="Count leading zero bits"] -==== clz - -Synopsis:: -Count leading zero bits - -Mnemonic:: -clz _rd_, _rs_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x13, attr: ['OP-IMM'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x1, attr: ['CLZ'] }, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 0x0, attr: ['CLZ'] }, - { bits: 7, name: 0x30, attr: ['CLZ'] }, -]} -.... - -Description:: -This instruction counts the number of 0's before the first 1, starting at the most-significant bit (i.e., XLEN-1) and progressing to bit 0. Accordingly, if the input is 0, the output is XLEN, and if the most-significant bit of the input is a 1, the output is 0. - -Operation:: -[source,sail] --- -val HighestSetBit : forall ('N : Int), 'N >= 0. bits('N) -> int - -function HighestSetBit x = { - foreach (i from (xlen - 1) to 0 by 1 in dec) - if [x[i]] == 0b1 then return(i) else (); - return -1; -} - -let rs = X(rs); -X[rd] = (xlen - 1) - HighestSetBit(rs); --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbb (<<#zbb>>) -|0.93 -|Frozen -|=== - diff --git a/src/insns/clzw.adoc b/src/insns/clzw.adoc deleted file mode 100644 index 67a64d9..0000000 --- a/src/insns/clzw.adoc +++ /dev/null @@ -1,53 +0,0 @@ -[#insns-clzw,reftext="Count leading zero bits in word"] -==== clzw - -Synopsis:: -Count leading zero bits in word - -Mnemonic:: -clzw _rd_, _rs_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x1b, attr: ['OP-IMM-32'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x1, attr: ['CLZW'] }, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 0x0, attr: ['CLZW'] }, - { bits: 7, name: 0x30, attr: ['CLZW'] }, -]} -.... - -Description:: -This instruction counts the number of 0's before the first 1 starting at bit 31 and progressing to bit 0. -Accordingly, if the least-significant word is 0, the output is 32, and if the most-significant bit of the word (i.e., bit 31) is a 1, the output is 0. - -Operation:: -[source,sail] --- -val HighestSetBit32 : forall ('N : Int), 'N >= 0. bits('N) -> int - -function HighestSetBit32 x = { - foreach (i from 31 to 0 by 1 in dec) - if [x[i]] == 0b1 then return(i) else (); - return -1; -} - -let rs = X(rs); -X[rd] = 31 - HighestSetBit(rs); --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbb (<<#zbb>>) -|0.93 -|Frozen -|=== - diff --git a/src/insns/cpop.adoc b/src/insns/cpop.adoc deleted file mode 100644 index 24e7a2f..0000000 --- a/src/insns/cpop.adoc +++ /dev/null @@ -1,56 +0,0 @@ -[#insns-cpop,reftext="Count set bits"] -==== cpop - -Synopsis:: -Count set bits - -Mnemonic:: -cpop _rd_, _rs_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x13, attr: ['OP-IMM'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x1, attr: ['CPOP'] }, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 0x2, attr: ['CPOP'] }, - { bits: 7, name: 0x30, attr: ['CPOP'] }, -]} -.... -Description:: -This instructions counts the number of 1's (i.e., set bits) in the source register. - -Operation:: -[source,sail] --- -let bitcount = 0; -let rs = X(rs); - -foreach (i from 0 to (xlen - 1) in inc) - if rs[i] == 0b1 then bitcount = bitcount + 1 else (); - -X[rd] = bitcount --- - -.Software Hint -[NOTE, caption="SH" ] -=============================================================== -This operations is known as population count, popcount, sideways sum, bit summation, or Hamming weight. - -The GCC builtin function `+__builtin_popcount (unsigned int x)+` is implemented by cpop on RV32 and by *cpopw* on RV64. -The GCC builtin function `+__builtin_popcountl (unsigned long x)+` for LP64 is implemented by *cpop* on RV64. -=============================================================== - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbb (<<#zbb>>) -|0.93 -|Frozen -|=== diff --git a/src/insns/cpopw.adoc b/src/insns/cpopw.adoc deleted file mode 100644 index d61336a..0000000 --- a/src/insns/cpopw.adoc +++ /dev/null @@ -1,49 +0,0 @@ -[#insns-cpopw,reftext="Count set bits in word"] -==== cpopw - -Synopsis:: -Count set bits in word - -Mnemonic:: -cpopw _rd_, _rs_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x1b, attr: ['OP-IMM-32'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x1, attr: ['CPOPW'] }, - { bits: 5, name: 'rs' }, - { bits: 5, name: 0x2, attr: ['CPOPW'] }, - { bits: 7, name: 0x30, attr: ['CPOPW'] }, -]} -.... -Description:: -This instructions counts the number of 1's (i.e., set bits) in the least-significant word of the source register. - -Operation:: -[source,sail] --- -let bitcount = 0; -let val = X(rs); - -foreach (i from 0 to 31 in inc) - if val[i] == 0b1 then bitcount = bitcount + 1 else (); - -X[rd] = bitcount --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbb (<<#zbb>>) -|0.93 -|Frozen -|=== - - diff --git a/src/insns/ctz.adoc b/src/insns/ctz.adoc deleted file mode 100644 index 545d768..0000000 --- a/src/insns/ctz.adoc +++ /dev/null @@ -1,53 +0,0 @@ -[#insns-ctz,reftext="Count trailing zero bits"] -==== ctz - -Synopsis:: -Count trailing zeros - -Mnemonic:: -ctz _rd_, _rs_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x13, attr: ['OP-IMM'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x1, attr: ['CTZ/CTZW'] }, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 0x1, attr: ['CTZ/CTZW'] }, - { bits: 7, name: 0x30, attr: ['CTZ/CTZW'] }, -]} -.... - -Description:: -This instruction counts the number of 0's before the first 1, starting at the least-significant bit (i.e., 0) and progressing to the most-significant bit (i.e., XLEN-1). -Accordingly, if the input is 0, the output is XLEN, and if the least-significant bit of the input is a 1, the output is 0. - -Operation:: -[source,sail] --- -val LowestSetBit : forall ('N : Int), 'N >= 0. bits('N) -> int - -function LowestSetBit x = { - foreach (i from 0 to (xlen - 1) by 1 in dec) - if [x[i]] == 0b1 then return(i) else (); - return xlen; -} - -let rs = X(rs); -X[rd] = LowestSetBit(rs); --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbb (<<#zbb>>) -|0.93 -|Frozen -|=== - diff --git a/src/insns/ctzw.adoc b/src/insns/ctzw.adoc deleted file mode 100644 index 7442cf0..0000000 --- a/src/insns/ctzw.adoc +++ /dev/null @@ -1,52 +0,0 @@ -[#insns-ctzw,reftext="Count trailing zero bits in word"] -==== ctzw - -Synopsis:: -Count trailing zero bits in word - -Mnemonic:: -ctzw _rd_, _rs_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x1b, attr: ['OP-IMM-32'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x1, attr: ['CTZ/CTZW'] }, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 0x1, attr: ['CTZ/CTZW'] }, - { bits: 7, name: 0x30, attr: ['CTZ/CTZW'] }, -]} -.... - -Description:: -This instruction counts the number of 0's before the first 1, starting at the least-significant bit (i.e., 0) and progressing to the most-significant bit of the least-significant word (i.e., 31). Accordingly, if the least-significant word is 0, the output is 32, and if the least-significant bit of the input is a 1, the output is 0. - -Operation:: -[source,sail] --- -val LowestSetBit32 : forall ('N : Int), 'N >= 0. bits('N) -> int - -function LowestSetBit32 x = { - foreach (i from 0 to 31 by 1 in dec) - if [x[i]] == 0b1 then return(i) else (); - return 32; -} - -let rs = X(rs); -X[rd] = LowestSetBit32(rs); --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbb (<<#zbb>>) -|0.93 -|Frozen -|=== - diff --git a/src/insns/max.adoc b/src/insns/max.adoc deleted file mode 100644 index 621198e..0000000 --- a/src/insns/max.adoc +++ /dev/null @@ -1,60 +0,0 @@ -[#insns-max,reftext="Maximum"] -==== max - -Synopsis:: -Maximum - -Mnemonic:: -max _rd_, _rs1_, _rs2_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x33, attr: ['OP'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x6, attr: ['MAX']}, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'rs2' }, - { bits: 7, name: 0x05, attr: ['MINMAX/CLMUL'] }, -]} -.... - -Description:: -This instruction returns the larger of two signed integers. - -Operation:: -[source,sail] --- -let rs1_val = X(rs1); -let rs2_val = X(rs2); - -let result = if rs1_val <_s rs2_val - then rs2_val - else rs1_val; - -X(rd) = result; --- - -.Software Hint -[NOTE, caption="SW"] -=============================================================== -Calculating the absolute value of a signed integer can be performed -using the following sequence: *neg rD,rS* followed by *max -rD,rS,rD*. When using this common sequence, it is suggested that they -are scheduled with no intervening instructions so that -implementations that are so optimized can fuse them together. -=============================================================== - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbb (<<#zbb>>) -|0.93 -|Frozen -|=== - diff --git a/src/insns/maxu.adoc b/src/insns/maxu.adoc deleted file mode 100644 index d2473a7..0000000 --- a/src/insns/maxu.adoc +++ /dev/null @@ -1,50 +0,0 @@ -[#insns-maxu,reftext="Unsigned maximum"] -==== maxu - -Synopsis:: -Unsigned maximum - -Mnemonic:: -maxu _rd_, _rs1_, _rs2_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x33, attr: ['OP'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x7, attr: ['MAXU']}, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'rs2' }, - { bits: 7, name: 0x05, attr: ['MINMAX/CLMUL'] }, -]} -.... - -Description:: -This instruction returns the larger of two unsigned integers. - -Operation:: -[source,sail] --- -let rs1_val = X(rs1); -let rs2_val = X(rs2); - -let result = if rs1_val <_u rs2_val - then rs2_val - else rs1_val; - -X(rd) = result; --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbb (<<#zbb>>) -|0.93 -|Frozen -|=== - diff --git a/src/insns/min.adoc b/src/insns/min.adoc deleted file mode 100644 index 550ca69..0000000 --- a/src/insns/min.adoc +++ /dev/null @@ -1,50 +0,0 @@ -[#insns-min,reftext="Minimum"] -==== min - -Synopsis:: -Minimum - -Mnemonic:: -min _rd_, _rs1_, _rs2_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x33, attr: ['OP'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x4, attr: ['MIN']}, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'rs2' }, - { bits: 7, name: 0x05, attr: ['MINMAX/CLMUL'] }, -]} -.... - -Description:: -This instruction returns the smaller of two signed integers. - -Operation:: -[source,sail] --- -let rs1_val = X(rs1); -let rs2_val = X(rs2); - -let result = if rs1_val <_s rs2_val - then rs1_val - else rs2_val; - -X(rd) = result; --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbb (<<#zbb>>) -|0.93 -|Frozen -|=== - diff --git a/src/insns/minu.adoc b/src/insns/minu.adoc deleted file mode 100644 index 8ff623d..0000000 --- a/src/insns/minu.adoc +++ /dev/null @@ -1,50 +0,0 @@ -[#insns-minu,reftext="Unsigned minimum"] -==== minu - -Synopsis:: -Unsigned minimum - -Mnemonic:: -minu _rd_, _rs1_, _rs2_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x33, attr: ['OP'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x5, attr: ['MINU']}, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'rs2' }, - { bits: 7, name: 0x05, attr: ['MINMAX/CLMUL'] }, -]} -.... - -Description:: -This instruction returns the smaller of two unsigned integers. - -Operation:: -[source,sail] --- -let rs1_val = X(rs1); -let rs2_val = X(rs2); - -let result = if rs1_val <_u rs2_val - then rs1_val - else rs2_val; - -X(rd) = result; --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbb (<<#zbb>>) -|0.93 -|Frozen -|=== - diff --git a/src/insns/orc_b.adoc b/src/insns/orc_b.adoc deleted file mode 100644 index 2a16d18..0000000 --- a/src/insns/orc_b.adoc +++ /dev/null @@ -1,51 +0,0 @@ -[#insns-orc_b,reftext="Bitwise OR-Combine, byte granule"] -==== orc.b - -Synopsis:: -Bitwise OR-Combine, byte granule - -Mnemonic:: -orc.b _rd_, _rs_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x13, attr: ['OP-IMM'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x5 }, - { bits: 5, name: 'rs' }, - { bits: 12, name: 0x287 } -]} -.... - -Description:: -Combines the bits within each byte using bitwise logical OR. -This sets the bits of each byte in the result _rd_ to all zeros if no bit within the respective byte of _rs_ is set, or to all ones if any bit within the respective byte of _rs_ is set. - -Operation:: -[source,sail] --- -let input = X(rs); -let output : xlenbits = 0; - -foreach (i from 0 to (xlen - 8) by 8) { - output[(i + 7)..i] = if input[(i + 7)..i] == 0 - then 0b00000000 - else 0b11111111; -} - -X[rd] = output; --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbb (<<#zbb>>) -|0.93 -|Frozen -|=== diff --git a/src/insns/orn.adoc b/src/insns/orn.adoc deleted file mode 100644 index 7a6eefb..0000000 --- a/src/insns/orn.adoc +++ /dev/null @@ -1,47 +0,0 @@ -[#insns-orn,reftext="OR with inverted operand"] -==== orn - -Synopsis:: -OR with inverted operand - -Mnemonic:: -orn _rd_, _rs1_, _rs2_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x33, attr: ['OP'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x6, attr: ['ORN']}, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'rs2' }, - { bits: 7, name: 0x20, attr: ['ORN'] }, -]} -.... - -Description:: -This instruction performs the bitwise logical OR operation between _rs1_ and the bitwise inversion of _rs2_. - -Operation:: -[source,sail] --- -X(rd) = X(rs1) | ~X(rs2); --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbb (<<#zbb>>) -|0.93 -|Frozen - -|Zbkb (<<#zbkb>>) -|v0.9.4 -|Frozen -|=== - diff --git a/src/insns/pack.adoc b/src/insns/pack.adoc deleted file mode 100644 index 82c3b2a..0000000 --- a/src/insns/pack.adoc +++ /dev/null @@ -1,46 +0,0 @@ -[#insns-pack,reftext="Pack low halves of registers"] -==== pack - -Synopsis:: -Pack the low halves of _rs1_ and _rs2_ into _rd_. - -Mnemonic:: -pack _rd_, _rs1_, _rs2_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - {bits: 7, name: 0x33, attr: ['OP'] }, - {bits: 5, name: 'rd'}, - {bits: 3, name: 0x4, attr:['PACK']}, - {bits: 5, name: 'rs1'}, - {bits: 5, name: 'rs2'}, - {bits: 7, name: 0x4, attr:['PACK']}, -]} -.... - -Description:: -The pack instruction packs the XLEN/2-bit lower halves of _rs1_ and _rs2_ into -_rd_, with _rs1_ in the lower half and _rs2_ in the upper half. - -Operation:: -[source,sail] --- -let lo_half : bits(xlen/2) = X(rs1)[xlen/2-1..0]; -let hi_half : bits(xlen/2) = X(rs2)[xlen/2-1..0]; -X(rd) = EXTZ(hi_half @ lo_half); --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbkb (<<#zbkb>>) -|v0.9.4 -|Frozen -|=== - diff --git a/src/insns/packh.adoc b/src/insns/packh.adoc deleted file mode 100644 index 1af719e..0000000 --- a/src/insns/packh.adoc +++ /dev/null @@ -1,47 +0,0 @@ -[#insns-packh,reftext="Pack low bytes of registers"] -==== packh - -Synopsis:: -Pack the low bytes of _rs1_ and _rs2_ into _rd_. - -Mnemonic:: -packh _rd_, _rs1_, _rs2_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - {bits: 7, name: 0x33, attr: ['OP'] }, - {bits: 5, name: 'rd'}, - {bits: 3, name: 0x7, attr: ['PACKH']}, - {bits: 5, name: 'rs1'}, - {bits: 5, name: 'rs2'}, - {bits: 7, name: 0x4, attr: ['PACKH']}, -]} -.... - -Description:: -And the packh instruction packs the least-significant bytes of -_rs1_ and _rs2_ into the 16 least-significant bits of _rd_, -zero extending the rest of _rd_. - -Operation:: -[source,sail] --- -let lo_half : bits(8) = X(rs1)[7..0]; -let hi_half : bits(8) = X(rs2)[7..0]; -X(rd) = EXTZ(hi_half @ lo_half); --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbkb (<<#zbkb>>) -|v0.9.4 -|Frozen -|=== - diff --git a/src/insns/packw.adoc b/src/insns/packw.adoc deleted file mode 100644 index 78c5e1b..0000000 --- a/src/insns/packw.adoc +++ /dev/null @@ -1,49 +0,0 @@ -[#insns-packw,reftext="Pack low 16-bits of registers (RV64)"] -==== packw - -Synopsis:: -Pack the low 16-bits of _rs1_ and _rs2_ into _rd_ on RV64. - -Mnemonic:: -packw _rd_, _rs1_, _rs2_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ -{bits: 2, name: 0x3}, -{bits: 5, name: 0xe}, -{bits: 5, name: 'rd'}, -{bits: 3, name: 0x4}, -{bits: 5, name: 'rs1'}, -{bits: 5, name: 'rs2'}, -{bits: 7, name: 0x4}, -]} -.... - -Description:: -This instruction packs the low 16 bits of -_rs1_ and _rs2_ into the 32 least-significant bits of _rd_, -sign extending the 32-bit result to the rest of _rd_. -This instruction only exists on RV64 based systems. - -Operation:: -[source,sail] --- -let lo_half : bits(16) = X(rs1)[15..0]; -let hi_half : bits(16) = X(rs2)[15..0]; -X(rd) = EXTS(hi_half @ lo_half); --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbkb (<<#zbkb>>) -|v0.9.4 -|Frozen -|=== - diff --git a/src/insns/rev8.adoc b/src/insns/rev8.adoc deleted file mode 100644 index 5c92550..0000000 --- a/src/insns/rev8.adoc +++ /dev/null @@ -1,82 +0,0 @@ -[#insns-rev8,reftext="Byte-reverse register"] -==== rev8 - -Synopsis:: -Byte-reverse register - -Mnemonic:: -rev8 _rd_, _rs_ - -Encoding (RV32):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x13, attr: ['OP-IMM'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x5 }, - { bits: 5, name: 'rs' }, - { bits: 12, name: 0x698 } -]} -.... - -Encoding (RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x13, attr: ['OP-IMM'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x5 }, - { bits: 5, name: 'rs' }, - { bits: 12, name: 0x6b8 } -]} -.... - -Description:: -This instruction reverses the order of the bytes in _rs_. - -Operation:: -[source,sail] --- -let input = X(rs); -let output : xlenbits = 0; -let j = xlen - 1; - -foreach (i from 0 to (xlen - 8) by 8) { - output[i..(i + 7)] = input[(j - 7)..j]; - j = j - 8; -} - -X[rd] = output --- - -.Note -[NOTE, caption="A" ] -=============================================================== -The *rev8* mnemonic corresponds to different instruction encodings in RV32 and RV64. -=============================================================== - -.Software Hint -[NOTE, caption="SH" ] -=============================================================== -The byte-reverse operation is only available for the full register -width. To emulate word-sized and halfword-sized byte-reversal, -perform a `rev8 rd,rs` followed by a `srai rd,rd,K`, where K is -XLEN-32 and XLEN-16, respectively. -=============================================================== - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbb (<<#zbb>>) -|0.93 -|Frozen - -|Zbkb (<<#zbkb>>) -|v0.9.4 -|Frozen -|=== - diff --git a/src/insns/revb.adoc b/src/insns/revb.adoc deleted file mode 100644 index 67b83d1..0000000 --- a/src/insns/revb.adoc +++ /dev/null @@ -1,46 +0,0 @@ -[#insns-revb,reftext="Reverse bits in bytes"] -==== rev.b - -Synopsis:: -Reverse the bits in each byte of a source register. - -Mnemonic:: -rev.b _rd_, _rs_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x13, attr: ['OP-IMM'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x5 }, - { bits: 5, name: 'rs' }, - { bits: 12, name: 0x687 } -]} -.... - -Description:: -This instruction reverses the order of the bits in every byte of a register. - -Operation:: -[source,sail] --- -result : xlenbits = EXTZ(0b0); -foreach (i from 0 to sizeof(xlen) by 8) { - result[i+7..i] = reverse_bits_in_byte(X(rs1)[i+7..i]); -}; -X(rd) = result; --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbkb (<<#zbkb>>) -|v0.9.4 -|Frozen -|=== - diff --git a/src/insns/rol.adoc b/src/insns/rol.adoc deleted file mode 100644 index f937096..0000000 --- a/src/insns/rol.adoc +++ /dev/null @@ -1,52 +0,0 @@ -[#insns-rol,reftext="Rotate left (Register)"] -==== rol - -Synopsis:: -Rotate Left (Register) - -Mnemonic:: -rol _rd_, _rs1_, _rs2_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x33, attr: ['OP'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x1, attr: ['ROL']}, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'rs2' }, - { bits: 7, name: 0x30, attr: ['ROL'] }, -]} -.... - -Description:: -This instruction performs a rotate left of _rs1_ by the amount in least-significant log2(XLEN) bits of _rs2_. - -Operation:: -[source,sail] --- -let shamt = if xlen == 32 - then X(rs2)[4..0] - else X(rs2)[5..0]; -let result = (X(rs1) << shamt) | (X(rs1) >> (xlen - shamt)); - -X(rd) = result; --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbb (<<#zbb>>) -|0.93 -|Frozen - -|Zbkb (<<#zbkb>>) -|v0.9.4 -|Frozen -|=== - diff --git a/src/insns/rolw.adoc b/src/insns/rolw.adoc deleted file mode 100644 index feed9a7..0000000 --- a/src/insns/rolw.adoc +++ /dev/null @@ -1,51 +0,0 @@ -[#insns-rolw,reftext="Rotate Left Word (Register)"] -==== rolw - -Synopsis:: -Rotate Left Word (Register) - -Mnemonic:: -rolw _rd_, _rs1_, _rs2_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x3b, attr: ['OP-32'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x1, attr: ['ROLW']}, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'rs2' }, - { bits: 7, name: 0x30, attr: ['ROLW'] }, -]} -.... - -Description:: -This instruction performs a rotate left on the least-significant word of _rs1_ by the amount in least-significant 5 bits of _rs2_. -The resulting word value is sign-extended by copying bit 31 to all of the more-significant bits. - -Operation:: -[source,sail] --- -let rs1 = EXTZ(X(rs1)[31..0]) -let shamt = X(rs2)[4..0]; -let result = (rs1 << shamt) | (rs1 >> (32 - shamt)); -X(rd) = EXTS(result[31..0]); --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbb (<<#zbb>>) -|0.93 -|Frozen - -|Zbkb (<<#zbkb>>) -|v0.9.4 -|Frozen -|=== - diff --git a/src/insns/ror.adoc b/src/insns/ror.adoc deleted file mode 100644 index c8a653f..0000000 --- a/src/insns/ror.adoc +++ /dev/null @@ -1,52 +0,0 @@ -[#insns-ror,reftext="Rotate right (Register)"] -==== ror - -Synopsis:: -Rotate Right - -Mnemonic:: -ror _rd_, _rs1_, _rs2_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x33, attr: ['OP'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x5, attr: ['ROR']}, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'rs2' }, - { bits: 7, name: 0x30, attr: ['ROR'] }, -]} -.... - -Description:: -This instruction performs a rotate right of _rs1_ by the amount in least-significant log2(XLEN) bits of _rs2_. - -Operation:: -[source,sail] --- -let shamt = if xlen == 32 - then X(rs2)[4..0] - else X(rs2)[5..0]; -let result = (X(rs1) >> shamt) | (X(rs1) << (xlen - shamt)); - -X(rd) = result; --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbb (<<#zbb>>) -|0.93 -|Frozen - -|Zbkb (<<#zbkb>>) -|v0.9.4 -|Frozen -|=== - diff --git a/src/insns/rori.adoc b/src/insns/rori.adoc deleted file mode 100644 index a63256e..0000000 --- a/src/insns/rori.adoc +++ /dev/null @@ -1,66 +0,0 @@ -[#insns-rori,reftext="Rotate right (Immediate)"] -==== rori - -Synopsis:: -Rotate Right (Immediate) - -Mnemonic:: -rori _rd_, _rs1_, _shamt_ - -Encoding (RV32):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x13, attr: ['OP-IMM'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x5, attr: ['RORI']}, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'shamt' }, - { bits: 7, name: 0x30, attr: ['RORI'] }, -]} -.... - -Encoding (RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x13, attr: ['OP-IMM'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x5, attr: ['RORI']}, - { bits: 5, name: 'rs1' }, - { bits: 6, name: 'shamt' }, - { bits: 6, name: 0x18, attr: ['RORI'] }, -]} -.... - -Description:: -This instruction performs a rotate right of _rs1_ by the amount in the least-significant log2(XLEN) bits of _shamt_. -For RV32, the encodings corresponding to shamt[5]=1 are reserved. - -Operation:: -[source,sail] --- -let shamt = if xlen == 32 - then shamt[4..0] - else shamt[5..0]; -let result = (X(rs1) >> shamt) | (X(rs1) << (xlen - shamt)); - -X(rd) = result; --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbb (<<#zbb>>) -|0.93 -|Frozen - -|Zbkb (<<#zbkb>>) -|v0.9.4 -|Frozen -|=== - diff --git a/src/insns/roriw.adoc b/src/insns/roriw.adoc deleted file mode 100644 index 65b8fd9..0000000 --- a/src/insns/roriw.adoc +++ /dev/null @@ -1,54 +0,0 @@ -[#insns-roriw,reftext="Rotate right Word (Immediate)"] -==== roriw - -Synopsis:: -Rotate Right Word by Immediate - -Mnemonic:: -roriw _rd_, _rs1_, _shamt_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x1b, attr: ['OP-IMM-32'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x5, attr: ['RORIW']}, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'shamt' }, - { bits: 7, name: 0x30, attr: ['RORIW'] }, -]} -.... - -Description:: -This instruction performs a rotate right on the least-significant word -of _rs1_ by the amount in the least-significant log2(XLEN) bits of -_shamt_. -The resulting word value is sign-extended by copying bit 31 to all of -the more-significant bits. - - -Operation:: -[source,sail] --- -let rs1_data = EXTZ(X(rs1)[31..0]; -let result = (rs1_data >> shamt) | (rs1_data << (32 - shamt)); -X(rd) = EXTS(result[31..0]); --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbb (<<#zbb>>) -|0.93 -|Frozen - -|Zbkb (<<#zbkb>>) -|v0.9.4 -|Frozen -|=== - diff --git a/src/insns/rorw.adoc b/src/insns/rorw.adoc deleted file mode 100644 index d06d52f..0000000 --- a/src/insns/rorw.adoc +++ /dev/null @@ -1,51 +0,0 @@ -[#insns-rorw,reftext="Rotate right Word (Register)"] -==== rorw - -Synopsis:: -Rotate Right Word (Register) - -Mnemonic:: -rorw _rd_, _rs1_, _rs2_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x3b, attr: ['OP-32'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x5, attr: ['RORW']}, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'rs2' }, - { bits: 7, name: 0x30, attr: ['RORW'] }, -]} -.... - -Description:: -This instruction performs a rotate right on the least-significant word of _rs1_ by the amount in least-significant 5 bits of _rs2_. -The resultant word is sign-extended by copying bit 31 to all of the more-significant bits. - -Operation:: -[source,sail] --- -let rs1 = EXTZ(X(rs1)[31..0]) -let shamt = X(rs2)[4..0]; -let result = (rs1 >> shamt) | (rs1 << (32 - shamt)); -X(rd) = EXTS(result); --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbb (<<#zbb>>) -|0.93 -|Frozen - -|Zbkb (<<#zbkb>>) -|v0.9.4 -|Frozen -|=== - diff --git a/src/insns/sext_b.adoc b/src/insns/sext_b.adoc deleted file mode 100644 index 87a7571..0000000 --- a/src/insns/sext_b.adoc +++ /dev/null @@ -1,43 +0,0 @@ -[#insns-sext_b,reftext="Sign-extend byte"] -==== sext.b - -Synopsis:: -Sign-extend byte - -Mnemonic:: -sext.b _rd_, _rs_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x13, attr: ['OP-IMM'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x1, attr: ['SEXT.B/SEXT.H'] }, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 0x04, attr: ['SEXT.B'] }, - { bits: 7, name: 0x30 }, -]} -.... - -Description:: -This instruction sign-extends the least-significant byte in the source to XLEN by copying the most-significant bit in the byte (i.e., bit 7) to all of the more-significant bits. - -Operation:: -[source,sail] --- -X(rd) = EXTS(X(rs)[7..0]); --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbb (<<#zbb>>) -|0.93 -|Frozen -|=== - diff --git a/src/insns/sext_h.adoc b/src/insns/sext_h.adoc deleted file mode 100644 index f7208a5..0000000 --- a/src/insns/sext_h.adoc +++ /dev/null @@ -1,43 +0,0 @@ -[#insns-sext_h,reftext="Sign-extend halfword"] -==== sext.h - -Synopsis:: -Sign-extend halfword - -Mnemonic:: -sext.h _rd_, _rs_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x13, attr: ['OP-IMM'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x1, attr: ['SEXT.B/SEXT.H'] }, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 0x05, attr: ['SEXT.H'] }, - { bits: 7, name: 0x30 }, -]} -.... - -Description:: -This instruction sign-extends the least-significant halfword in _rs_ to XLEN by copying the most-significant bit in the halfword (i.e., bit 15) to all of the more-significant bits. - -Operation:: -[source,sail] --- -X(rd) = EXTS(X(rs)[15..0]); --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbb (<<#zbb>>) -|0.93 -|Frozen -|=== - diff --git a/src/insns/sh1add.adoc b/src/insns/sh1add.adoc deleted file mode 100644 index 636fc54..0000000 --- a/src/insns/sh1add.adoc +++ /dev/null @@ -1,46 +0,0 @@ -[#insns-sh1add,reftext=Shift left by 1 and add] -==== sh1add - -Synopsis:: -Shift left by 1 and add - -Mnemonic:: -sh1add _rd_, _rs1_, _rs2_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x33, attr: ['OP'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x2, attr: ['SH1ADD'] }, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'rs2' }, - { bits: 7, name: 0x10, attr: ['SH1ADD'] }, -]} -.... - -Description:: -This instruction shifts _rs1_ to the left by 1 bit and adds it to _rs2_. - -Operation:: -[source,sail] --- -X(rd) = X(rs2) + (X(rs1) << 1); --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zba (<<#zba>>) -|0.93 -|Frozen -|=== - - -// We have decided that this and all other instructions will not have reserved encodings for "useless encodings" -// We could follow suit of the base ISA and create HINTs if there is some recognized value for doing so diff --git a/src/insns/sh1add_uw.adoc b/src/insns/sh1add_uw.adoc deleted file mode 100644 index 09e515d..0000000 --- a/src/insns/sh1add_uw.adoc +++ /dev/null @@ -1,46 +0,0 @@ -[#insns-sh1add_uw,reftext=Shift unsigned word left by 1 and add] -==== sh1add.uw - -Synopsis:: -Shift unsigned word left by 1 and add - -Mnemonic:: -sh1add.uw _rd_, _rs1_, _rs2_ -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x3b, attr: ['OP-32'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x2, attr: ['SH1ADD.UW'] }, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'rs2' }, - { bits: 7, name: 0x10, attr: ['SH1ADD.UW'] }, -]} -.... - -Description:: -This instruction performs an XLEN-wide addition of two addends. -The first addend is _rs2_. The second addend is the unsigned value formed by extracting the least-significant word of _rs1_ and shifting it left by 1 place. - -Operation:: -[source,sail] --- -let base = X(rs2); -let index = EXTZ(X(rs1)[31..0]); - -X(rd) = base + (index << 1); --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zba (<<#zba>>) -|0.93 -|Frozen -|=== - diff --git a/src/insns/sh2add.adoc b/src/insns/sh2add.adoc deleted file mode 100644 index 273a5df..0000000 --- a/src/insns/sh2add.adoc +++ /dev/null @@ -1,43 +0,0 @@ -[#insns-sh2add,reftext=Shift left by 2 and add] -==== sh2add - -Synopsis:: -Shift left by 2 and add - -Mnemonic:: -sh2add _rd_, _rs1_, _rs2_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x33, attr: ['OP'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x4, attr: ['SH2ADD'] }, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'rs2' }, - { bits: 7, name: 0x10, attr: ['SH2ADD'] }, -]} -.... - -Description:: -This instruction shifts _rs1_ to the left by 2 places and adds it to _rs2_. - -Operation:: -[source,sail] --- -X(rd) = X(rs2) + (X(rs1) << 2); --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zba (<<#zba>>) -|0.93 -|Frozen -|=== - diff --git a/src/insns/sh2add_uw.adoc b/src/insns/sh2add_uw.adoc deleted file mode 100644 index 44a9ade..0000000 --- a/src/insns/sh2add_uw.adoc +++ /dev/null @@ -1,48 +0,0 @@ -[#insns-sh2add_uw,reftext=Shift unsigned word left by 2 and add] -==== sh2add.uw - -Synopsis:: -Shift unsigned word left by 2 and add - -Mnemonic:: -sh2add.uw _rd_, _rs1_, _rs2_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x3b, attr: ['OP-32'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x4, attr: ['SH2ADD.UW'] }, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'rs2' }, - { bits: 7, name: 0x10, attr: ['SH2ADD.UW'] }, -]} -.... - -Description:: -This instruction performs an XLEN-wide addition of two addends. -The first addend is _rs2_. -The second addend is the unsigned value formed by extracting the least-significant word of _rs1_ and shifting it left by 2 places. - -Operation:: -[source,sail] --- -let base = X(rs2); -let index = EXTZ(X(rs1)[31..0]); - -X(rd) = base + (index << 2); --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zba (<<#zba>>) -|0.93 -|Frozen -|=== - diff --git a/src/insns/sh3add.adoc b/src/insns/sh3add.adoc deleted file mode 100644 index 2ebc08b..0000000 --- a/src/insns/sh3add.adoc +++ /dev/null @@ -1,42 +0,0 @@ -[#insns-sh3add,reftext=Shift left by 3 and add] -==== sh3add - -Synopsis:: -Shift left by 3 and add - -Mnemonic:: -sh3add _rd_, _rs1_, _rs2_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x33, attr: ['OP'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x6, attr: ['SH3ADD'] }, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'rs2' }, - { bits: 7, name: 0x10, attr: ['SH3ADD'] }, -]} -.... - -Description:: -This instruction shifts _rs1_ to the left by 3 places and adds it to _rs2_. - -Operation:: -[source,sail] --- -X(rd) = X(rs2) + (X(rs1) << 3); --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zba (<<#zba>>) -|0.93 -|Frozen -|=== diff --git a/src/insns/sh3add_uw.adoc b/src/insns/sh3add_uw.adoc deleted file mode 100644 index 500c32c..0000000 --- a/src/insns/sh3add_uw.adoc +++ /dev/null @@ -1,45 +0,0 @@ -[#insns-sh3add_uw,reftext=Shift unsigned word left by 3 and add] -==== sh3add.uw - -Synopsis:: -Shift unsigned word left by 3 and add - -Mnemonic:: -sh3add.uw _rd_, _rs1_, _rs2_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x3b, attr: ['OP-32'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x6, attr: ['SH3ADD.UW'] }, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'rs2' }, - { bits: 7, name: 0x10, attr: ['SH3ADD.UW'] }, -]} -.... - -Description:: -This instruction performs an XLEN-wide addition of two addends. The first addend is _rs2_. The second addend is the unsigned value formed by extracting the least-significant word of _rs1_ and shifting it left by 3 places. - -Operation:: -[source,sail] --- -let base = X(rs2); -let index = EXTZ(X(rs1)[31..0]); - -X(rd) = base + (index << 3); --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zba (<<#zba>>) -|0.93 -|Frozen -|=== diff --git a/src/insns/slli_uw.adoc b/src/insns/slli_uw.adoc deleted file mode 100644 index 776d33e..0000000 --- a/src/insns/slli_uw.adoc +++ /dev/null @@ -1,51 +0,0 @@ -[#insns-slli_uw,reftext="Shift-left unsigned word (Immediate)"] -==== slli.uw - -Synopsis:: -Shift-left unsigned word (Immediate) - -Mnemonic:: -slli.uw _rd_, _rs1_, _shamt_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x1b, attr: ['OP-IMM-32'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x1, attr: ['SLLI.UW'] }, - { bits: 5, name: 'rs1' }, - { bits: 6, name: 'shamt' }, - { bits: 6, name: 0x02, attr: ['SLLI.UW'] }, -]} -.... - -Description:: -This instruction takes the least-significant word of _rs1_, zero-extends it, and shifts it left by the immediate. - -Operation:: -[source,sail] --- -X(rd) = (EXTZ(X(rs)[31..0]) << shamt); --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zba (<<#zba>>) -|0.93 -|Frozen -|=== - - -.Architecture Explanation -[NOTE, caption="A" ] -=============================================================== -This instruction is the same as *slli* with *zext.w* performed on _rs1_ before shifting. -=============================================================== - - diff --git a/src/insns/unzip.adoc b/src/insns/unzip.adoc deleted file mode 100644 index c1d3644..0000000 --- a/src/insns/unzip.adoc +++ /dev/null @@ -1,60 +0,0 @@ -[#insns-unzip,reftext="Bit deinterleave"] -==== unzip - -Synopsis:: -Implements the inverse of the zip instruction. - -Mnemonic:: -unzip _rd_, _rs_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ -{bits: 2, name: 0x3}, -{bits: 5, name: 0x4}, -{bits: 5, name: 'rd'}, -{bits: 3, name: 0x5}, -{bits: 5, name: 'rs1'}, -{bits: 5, name: 0x1f}, -{bits: 7, name: 0x4}, -]} -.... - -Description:: -This instruction gathers bits from the high and low halves of the source -word into odd/even bit positions in the destination word. -It is the inverse of the <> instruction. -This instruction is available only on RV32. - -Operation:: -[source,sail] --- -foreach (i from 0 to xlen/2-1) { - X(rd)[i] = X(rs1)[2*i] - X(rd)[i+xlen/2] = X(rs1)[2*i+1] -} --- - -.Software Hint -[NOTE, caption="SH" ] -=============================================================== -This instruction is useful for implementing the SHA3 cryptographic -hash function on a 32-bit architecture, as it implements the -bit-interleaving operation used to speed up the 64-bit rotations -directly. -=============================================================== - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbkb (<<#zbkb>>) (RV32) -|v0.9.4 -|Frozen -|=== - - diff --git a/src/insns/xnor.adoc b/src/insns/xnor.adoc deleted file mode 100644 index 63099e0..0000000 --- a/src/insns/xnor.adoc +++ /dev/null @@ -1,47 +0,0 @@ -[#insns-xnor,reftext="Exclusive NOR"] -==== xnor - -Synopsis:: -Exclusive NOR - -Mnemonic:: -xnor _rd_, _rs1_, _rs2_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x33, attr: ['OP'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x4, attr: ['XNOR']}, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'rs2' }, - { bits: 7, name: 0x20, attr: ['XNOR'] }, -]} -.... - -Description:: -This instruction performs the bit-wise exclusive-NOR operation on _rs1_ and _rs2_. - -Operation:: -[source,sail] --- -X(rd) = ~(X(rs1) ^ X(rs2)); --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbb (<<#zbb>>) -|0.93 -|Frozen - -|Zbkb (<<#zbkb>>) -|v0.9.4 -|Frozen -|=== - diff --git a/src/insns/xpermb.adoc b/src/insns/xpermb.adoc deleted file mode 100644 index 73298f8..0000000 --- a/src/insns/xpermb.adoc +++ /dev/null @@ -1,60 +0,0 @@ -[#insns-xpermb,reftext="Crossbar permutation (bytes)"] -==== xperm.b - -Synopsis:: -Byte-wise lookup of indices into a vector in registers. - -Mnemonic:: -xperm.b _rd_, _rs1_, _rs2_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ -{bits: 2, name: 0x3}, -{bits: 5, name: 0xc}, -{bits: 5, name: 'rd'}, -{bits: 3, name: 0x4}, -{bits: 5, name: 'rs1'}, -{bits: 5, name: 'rs2'}, -{bits: 7, name: 0x14}, -]} -.... - -Description:: -The xperm.b instruction operates on bytes. -The _rs1_ register contains a vector of XLEN/8 8-bit elements. -The _rs2_ register contains a vector of XLEN/8 8-bit indexes. -The result is each element in _rs2_ replaced by the indexed element in _rs1_, -or zero if the index into _rs2_ is out of bounds. - -Operation:: -[source,sail] --- -val xpermb_lookup : (bits(8), xlenbits) -> bits(8) -function xpermb_lookup (idx, lut) = { - (lut >> (idx @ 0b000))[7..0] -} - -function clause execute ( XPERM_B (rs2,rs1,rd)) = { - result : xlenbits = EXTZ(0b0); - foreach(i from 0 to xlen by 8) { - result[i+7..i] = xpermn_lookup(X(rs2)[i+7..i], X(rs1)); - }; - X(rd) = result; - RETIRE_SUCCESS -} --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbkx (<<#zbkx>>) -|v0.9.4 -|Frozen -|=== - diff --git a/src/insns/xpermn.adoc b/src/insns/xpermn.adoc deleted file mode 100644 index 22d9c19..0000000 --- a/src/insns/xpermn.adoc +++ /dev/null @@ -1,60 +0,0 @@ -[#insns-xpermn,reftext="Crossbar permutation (nibbles)"] -==== xperm.n - -Synopsis:: -Nibble-wise lookup of indices into a vector. - -Mnemonic:: -xperm.n _rd_, _rs1_, _rs2_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ -{bits: 2, name: 0x3}, -{bits: 5, name: 0xc}, -{bits: 5, name: 'rd'}, -{bits: 3, name: 0x2}, -{bits: 5, name: 'rs1'}, -{bits: 5, name: 'rs2'}, -{bits: 7, name: 0x14}, -]} -.... - -Description:: -The xperm.n instruction operates on nibbles. -The _rs1_ register contains a vector of XLEN/4 4-bit elements. -The _rs2_ register contains a vector of XLEN/4 4-bit indexes. -The result is each element in _rs2_ replaced by the indexed element in _rs1_, -or zero if the index into _rs2_ is out of bounds. - -Operation:: -[source,sail] --- -val xpermn_lookup : (bits(4), xlenbits) -> bits(4) -function xpermn_lookup (idx, lut) = { - (lut >> (idx @ 0b00))[3..0] -} - -function clause execute ( XPERM_N (rs2,rs1,rd)) = { - result : xlenbits = EXTZ(0b0); - foreach(i from 0 to xlen by 4) { - result[i+3..i] = xpermn_lookup(X(rs2)[i+3..i], X(rs1)); - }; - X(rd) = result; - RETIRE_SUCCESS -} --- - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbkx (<<#zbkx>>) -|v0.9.4 -|Frozen -|=== - diff --git a/src/insns/zext_h.adoc b/src/insns/zext_h.adoc deleted file mode 100644 index cae2105..0000000 --- a/src/insns/zext_h.adoc +++ /dev/null @@ -1,61 +0,0 @@ -[#insns-zext_h,reftext="Zero-extend halfword"] -==== zext.h - -Synopsis:: -Zero-extend halfword - -Mnemonic:: -zext.h _rd_, _rs_ - -Encoding (RV32):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x33, attr: ['OP'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x4, attr: ['ZEXT.H']}, - { bits: 5, name: 'rs' }, - { bits: 5, name: 0x00 }, - { bits: 7, name: 0x04 }, -]} -.... - -Encoding (RV64):: -[wavedrom, , svg] -.... -{reg:[ - { bits: 7, name: 0x3b, attr: ['OP-32'] }, - { bits: 5, name: 'rd' }, - { bits: 3, name: 0x4, attr: ['ZEXT.H']}, - { bits: 5, name: 'rs' }, - { bits: 5, name: 0x00 }, - { bits: 7, name: 0x04 }, -]} -.... - -Description:: -This instruction zero-extends the least-significant halfword of the source to XLEN by inserting 0's into all of the bits more significant than 15. - -Operation:: -[source,sail] --- -X(rd) = EXTZ(X(rs)[15..0]); --- - -.Note -[NOTE, caption="A" ] -=============================================================== -The *zext.h* mnemonic corresponds to different instruction encodings in RV32 and RV64. -=============================================================== - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbb (<<#zbb>>) -|0.93 -|Frozen -|=== diff --git a/src/insns/zip.adoc b/src/insns/zip.adoc deleted file mode 100644 index fcb5860..0000000 --- a/src/insns/zip.adoc +++ /dev/null @@ -1,60 +0,0 @@ -[#insns-zip,reftext="Bit interleave"] -==== zip - -Synopsis:: -Gather odd and even bits of the source word into upper/lower halves of the -destination. - -Mnemonic:: -zip _rd_, _rs_ - -Encoding:: -[wavedrom, , svg] -.... -{reg:[ -{bits: 2, name: 0x3}, -{bits: 5, name: 0x4}, -{bits: 5, name: 'rd'}, -{bits: 3, name: 0x1}, -{bits: 5, name: 'rs1'}, -{bits: 5, name: 0x1e}, -{bits: 7, name: 0x4}, -]} -.... - -Description:: -This instruction scatters all of the odd and even bits of a source word into -the high and low halves of a destination word. -It is the inverse of the <> instruction. -This instruction is available only on RV32. - -Operation:: -[source,sail] --- -foreach (i from 0 to xlen/2-1) { - X(rd)[2*i] = X(rs1)[i] - X(rd)[2*i+1] = X(rs1)[i+xlen/2] -} --- - -.Software Hint -[NOTE, caption="SH" ] -=============================================================== -This instruction is useful for implementing the SHA3 cryptographic -hash function on a 32-bit architecture, as it implements the -bit-interleaving operation used to speed up the 64-bit rotations -directly. -=============================================================== - -Included in:: -[%header,cols="4,2,2"] -|=== -|Extension -|Minimum version -|Lifecycle state - -|Zbkb (<<#zbkb>>) (RV32) -|v0.9.4 -|Frozen -|=== - -- cgit v1.1 From f44e0f0e424e833ad2195db2c74982dd352166aa Mon Sep 17 00:00:00 2001 From: wmat Date: Thu, 14 Mar 2024 09:42:20 -0400 Subject: Bumping version to 1.0.0 THe version in the title was 0.0 so bumping to 1.0.0 --- src/b-st-ext.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/b-st-ext.adoc b/src/b-st-ext.adoc index 59abd07..f7b4230 100644 --- a/src/b-st-ext.adoc +++ b/src/b-st-ext.adoc @@ -1,5 +1,5 @@ [[bits]] -== "B" Standard Extension for Bit Manipulation, Version 0.0 +== "B" Standard Extension for Bit Manipulation, Version 1.0.0 [[preface]] === Bit-manipulation a, b, c and s extensions grouped for public review and ratification -- cgit v1.1 From 10f7b78de7620635e39695343a3ad5323a248e56 Mon Sep 17 00:00:00 2001 From: wmat Date: Fri, 15 Mar 2024 11:42:52 -0400 Subject: Fixing Figure 16 diagram Figure 16 Standard portion of mie diagram was incorrect. Fixed missing MEIE bit. --- src/images/bytefield/miereg-standard.adoc | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/images/bytefield/miereg-standard.adoc b/src/images/bytefield/miereg-standard.adoc index d4affab..680fb1c 100644 --- a/src/images/bytefield/miereg-standard.adoc +++ b/src/images/bytefield/miereg-standard.adoc @@ -11,6 +11,8 @@ (draw-box "0" {:span 2}) (draw-box (text "LCOFIE" {:font-size 10}) {:span 1}) (draw-box "0" {:span 1}) +(draw-box "MEIE" {:span 1}) +(draw-box "0" {:span 1}) (draw-box "SEIE" {:span 1}) (draw-box "0" {:span 1}) (draw-box "MTIE" {:span 1}) @@ -36,4 +38,5 @@ (draw-box "1" {:span 1 :borders {}}) (draw-box "1" {:span 1 :borders {}}) (draw-box "1" {:span 1 :borders {}}) +(draw-box "1" {:span 1 :borders {}}) ---- -- cgit v1.1 From 0b3b9f7312f4c087137b714de144e7ee2e9b7f86 Mon Sep 17 00:00:00 2001 From: Rafael Sene Date: Fri, 15 Mar 2024 13:40:04 -0300 Subject: Upgrade the version of some GitHub Actions Signed-off-by: Rafael Sene --- .github/workflows/isa-build.yml | 22 ++++++---------------- .github/workflows/merge-and-release.yml | 5 +++-- 2 files changed, 9 insertions(+), 18 deletions(-) diff --git a/.github/workflows/isa-build.yml b/.github/workflows/isa-build.yml index 7135c26..ca1b4c5 100644 --- a/.github/workflows/isa-build.yml +++ b/.github/workflows/isa-build.yml @@ -28,7 +28,7 @@ jobs: steps: # Checkout the repository - name: Checkout repository - uses: actions/checkout@v3 + uses: actions/checkout@v4 # Set the short SHA for use in artifact names - name: Set short SHA @@ -57,7 +57,7 @@ jobs: # Upload the priv-isa-asciidoc PDF file - name: Upload priv-isa-asciidoc.pdf if: steps.build_files.outcome == 'success' - uses: actions/upload-artifact@v3 + uses: actions/upload-artifact@v4 with: name: priv-isa-asciidoc-${{ env.SHORT_SHA }}.pdf path: ${{ github.workspace }}/build/priv-isa-asciidoc.pdf @@ -66,7 +66,7 @@ jobs: # Upload the priv-isa-asciidoc HTML file - name: Upload priv-isa-asciidoc.html if: steps.build_files.outcome == 'success' - uses: actions/upload-artifact@v3 + uses: actions/upload-artifact@v4 with: name: priv-isa-asciidoc-${{ env.SHORT_SHA }}.html path: ${{ github.workspace }}/build/priv-isa-asciidoc.html @@ -75,7 +75,7 @@ jobs: # Upload the unpriv-isa-asciidoc PDF file - name: Upload unpriv-isa-asciidoc.pdf if: steps.build_files.outcome == 'success' - uses: actions/upload-artifact@v3 + uses: actions/upload-artifact@v4 with: name: unpriv-isa-asciidoc-${{ env.SHORT_SHA }}.pdf path: ${{ github.workspace }}/build/unpriv-isa-asciidoc.pdf @@ -84,24 +84,15 @@ jobs: # Upload the unpriv-isa-asciidoc HTML file - name: Upload unpriv-isa-asciidoc.html if: steps.build_files.outcome == 'success' - uses: actions/upload-artifact@v3 + uses: actions/upload-artifact@v4 with: name: unpriv-isa-asciidoc-${{ env.SHORT_SHA }}.html path: ${{ github.workspace }}/build/unpriv-isa-asciidoc.html retention-days: 7 - # Upload the priv-isa-latex PDF file - - name: Upload riscv-privileged.pdf - if: steps.build_files.outcome == 'success' - uses: actions/upload-artifact@v3 - with: - name: riscv-privileged-latex-${{ env.SHORT_SHA }}.pdf - path: ${{ github.workspace }}/build/riscv-privileged.pdf - retention-days: 7 - - name: Create Release if: steps.build_files.outcome == 'success' && github.event_name == 'workflow_dispatch' && github.event.inputs.create_release == 'true' - uses: softprops/action-gh-release@v1 + uses: softprops/action-gh-release@v2 with: draft: false tag_name: riscv-isa-release-${{ env.SHORT_SHA }}-${{ env.CURRENT_DATE }} @@ -114,7 +105,6 @@ jobs: ${{ github.workspace }}/build/priv-isa-asciidoc.html ${{ github.workspace }}/build/unpriv-isa-asciidoc.pdf ${{ github.workspace }}/build/unpriv-isa-asciidoc.html - ${{ github.workspace }}/build/riscv-privileged.pdf env: GITHUB_TOKEN: ${{ secrets.GHTOKEN }} \ No newline at end of file diff --git a/.github/workflows/merge-and-release.yml b/.github/workflows/merge-and-release.yml index 3a93bc2..88390e0 100644 --- a/.github/workflows/merge-and-release.yml +++ b/.github/workflows/merge-and-release.yml @@ -68,14 +68,15 @@ jobs: path: ${{ github.workspace }}/build/unpriv-isa-asciidoc.html - name: Create Release - uses: softprops/action-gh-release@v1 + uses: softprops/action-gh-release@v2 env: GITHUB_TOKEN: ${{ secrets.GHTOKEN }} with: tag_name: riscv-isa-release-${{ env.SHORT_SHA }}-${{ env.CURRENT_DATE }} - release_name: Release riscv-isa-release-${{ env.SHORT_SHA }}-${{ env.CURRENT_DATE }} + name: Release riscv-isa-release-${{ env.SHORT_SHA }}-${{ env.CURRENT_DATE }} draft: false prerelease: false + make_latest: true generate_release_notes: true body: | This release was created by: ${{ github.event.sender.login }} -- cgit v1.1 From 2981c7f28c646d49d246beeeb77c45bd9eeaf443 Mon Sep 17 00:00:00 2001 From: Kuan-Wei Chiu Date: Sat, 16 Mar 2024 09:47:56 +0800 Subject: Fix typo in Zawrs title Replace 'Statndard' with 'Standard'. --- src/zawrs.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/zawrs.adoc b/src/zawrs.adoc index 456c582..eb94036 100644 --- a/src/zawrs.adoc +++ b/src/zawrs.adoc @@ -1,4 +1,4 @@ -== "Zawrs" Statndard extension for Wait-on-Reservation-Set instructions, Version 1.01 +== "Zawrs" Standard extension for Wait-on-Reservation-Set instructions, Version 1.01 The Zawrs extension defines a pair of instructions to be used in polling loops that allows a core to enter a low-power state and wait on a store to a memory -- cgit v1.1 From 42adac0504e7b1e6470df6aa0c1862fc4349a506 Mon Sep 17 00:00:00 2001 From: Andrew Waterman Date: Sat, 16 Mar 2024 20:25:50 -0700 Subject: RNMI handler -> RNMI trap handler --- src/rnmi.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/rnmi.adoc b/src/rnmi.adoc index cbd19da..7886c67 100644 --- a/src/rnmi.adoc +++ b/src/rnmi.adoc @@ -85,9 +85,9 @@ exception that precipitated the double trap. include::images/bytefield/mnstatus.edn[] The `mnstatus` CSR holds a two-bit field, MNPP, which on entry to the -trap handler holds the privilege mode of the interrupted context, +RNMI trap handler holds the privilege mode of the interrupted context, encoded in the same manner as `mstatus`.MPP. It also holds a one-bit -field, MNPV, which on entry to the trap handler holds the virtualization +field, MNPV, which on entry to the RNMI trap handler holds the virtualization mode of the interrupted context, encoded in the same manner as `mstatus`.MPV. -- cgit v1.1 From bc1f3faf0e443921efb9911b4338f2587acc6119 Mon Sep 17 00:00:00 2001 From: Andrew Waterman Date: Sat, 16 Mar 2024 20:31:28 -0700 Subject: Add Zicfilp support to Smrnmi --- src/images/bytefield/mnstatus.edn | 20 ++++++++++++-------- src/rnmi.adoc | 6 ++++++ 2 files changed, 18 insertions(+), 8 deletions(-) diff --git a/src/images/bytefield/mnstatus.edn b/src/images/bytefield/mnstatus.edn index 186bfb8..8a5f39d 100644 --- a/src/images/bytefield/mnstatus.edn +++ b/src/images/bytefield/mnstatus.edn @@ -5,25 +5,29 @@ (def row-header-fn nil) (def left-margin 30) (def right-margin 30) -(def boxes-per-row 32) -(draw-column-headers {:height 24 :font-size 24 :labels (reverse ["0" "" "" "2" "" "3" "4" "" "" "6" "" "" "" "7" "" "" "8" "" "" "10" "11" "" "" "12" "13" "" "" "" "" "" "MXLEN-1" ""])}) +(def boxes-per-row 35) +(draw-column-headers {:height 24 :font-size 24 :labels (reverse ["0" "" "" "2" "" "3" "4" "" "" "6" "" "" "7" "" "" "8" "" "" "9" "" "" "10" "" "11" "" "" "12" "13" "" "" "" "" "" "MXLEN-1" ""])}) (draw-box (text "Reserved" {:font-style "italic" :font-size 24}) {:span 8}) (draw-box (text "MNPP" {:font-size 24}) {:span 2 :text-anchor "end" :borders {:top :border-unrelated :bottom :border-unrelated :left :border-unrelated}}) (draw-box (text "(WARL)" {:font-weight "bold" :font-size 20}) {:span 2 :text-anchor "start" :borders {:top :border-unrelated :right :border-unrelated :bottom :border-unrelated}}) -(draw-box (text "Reserved" {:font-style "italic" :font-size 24}) {:span 4}) -(draw-box (text "MNPV" {:font-size 24}) {:span 3 :text-anchor "end" :borders {:top :border-unrelated :bottom :border-unrelated :left :border-unrelated}}) -(draw-box (text "(WARL)" {:font-weight "bold" :font-size 24}) {:span 3 :text-anchor "start" :borders {:top :border-unrelated :right :border-unrelated :bottom :border-unrelated}}) +(draw-box (text "Reserved" {:font-style "italic" :font-size 24}) {:span 3}) +(draw-box (text "MNPELP" {:font-style "italic" :font-size 20}) {:span 3}) +(draw-box (text "Reserved" {:font-style "italic" :font-size 20}) {:span 3}) +(draw-box (text "MNPV" {:font-size 24}) {:span 2 :text-anchor "end" :borders {:top :border-unrelated :bottom :border-unrelated :left :border-unrelated}}) +(draw-box (text "(WARL)" {:font-weight "bold" :font-size 20}) {:span 2 :text-anchor "start" :borders {:top :border-unrelated :right :border-unrelated :bottom :border-unrelated}}) (draw-box (text "Reserved" {:font-style "italic" :font-size 24}) {:span 4}) (draw-box "NMIE" {:span 2}) (draw-box (text "Reserved" {:font-style "italic" :font-size 24}) {:span 4}) (draw-box "MXLEN-13" {:span 8 :borders {}}) (draw-box "2" {:span 4 :borders {}}) -(draw-box "3" {:span 4 :borders {}}) -(draw-box "1" {:span 6 :borders {}}) +(draw-box "1" {:span 3 :borders {}}) +(draw-box "1" {:span 3 :borders {}}) +(draw-box "1" {:span 3 :borders {}}) +(draw-box "1" {:span 4 :borders {}}) (draw-box "3" {:span 4 :borders {}}) (draw-box "1" {:span 2 :borders {}}) (draw-box "3" {:span 4 :borders {}}) ----- \ No newline at end of file +---- diff --git a/src/rnmi.adoc b/src/rnmi.adoc index 7886c67..705fb7d 100644 --- a/src/rnmi.adoc +++ b/src/rnmi.adoc @@ -91,6 +91,10 @@ field, MNPV, which on entry to the RNMI trap handler holds the virtualization mode of the interrupted context, encoded in the same manner as `mstatus`.MPV. +If the Zicfilp extension is implemented, `mnstatus` also holds the MNPELP +field, which on entry to the RNMI trap handler holds the previous `ELP` state. +When an RNMI trap is taken, MNPELP is set to `ELP` and `ELP` is set to 0. + `mnstatus` also holds the NMIE bit. When NMIE=1, nonmaskable interrupts are enabled. When NMIE=0, _all_ interrupts are disabled. @@ -137,6 +141,8 @@ MNRET is an M-mode-only instruction that uses the values in `mnepc` and `mnstatus` to return to the program counter, privilege mode, and virtualization mode of the interrupted context. This instruction also sets `mnstatus`.NMIE. If MNRET changes the privilege mode to a mode less privileged than M, it also sets `mstatus`.MPRV to 0. +If the Zicfilp extension is implemented, then if `mnstatus`.MNPP holds the +value __y__, MNRET sets `ELP` to the logical AND of __y__LPE and `mnstatus`.MNPELP. === RNMI Operation -- cgit v1.1 From 7d4b5c875c5aaa318025148ec9cf06419b8d000e Mon Sep 17 00:00:00 2001 From: Andrew Waterman Date: Mon, 18 Mar 2024 15:01:41 -0700 Subject: Relax behavior for problematic source/dest mismatched EEW overlap case See discussion at https://lists.riscv.org/g/tech-vector-ext/message/845 --- src/v-st-ext.adoc | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/v-st-ext.adoc b/src/v-st-ext.adoc index 0d569ec..98344fe 100644 --- a/src/v-st-ext.adoc +++ b/src/v-st-ext.adoc @@ -977,6 +977,10 @@ exceptions in machines without register renaming. Any instruction encoding that violates the overlap constraints is reserved. +When source and destination registers overlap and have different EEW, the +instruction is mask- and tail-agnostic, regardless of the setting of the +`vta` and `vma` bits in `vtype`. + The largest vector register group used by an instruction can not be greater than 8 vector registers (i.e., EMUL{le}8), and if a vector instruction would require greater than 8 vector registers in a group, -- cgit v1.1 From 586e9e4bfa7f2d2a1ea3aecaded4c19147cf246b Mon Sep 17 00:00:00 2001 From: Andrew Waterman Date: Mon, 18 Mar 2024 15:08:34 -0700 Subject: Clarify that vsetvli x0,x0 is reserved when vill was 1 beforehand See discussion at https://lists.riscv.org/g/tech-vector-ext/message/846 --- src/v-st-ext.adoc | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/v-st-ext.adoc b/src/v-st-ext.adoc index 98344fe..4821be9 100644 --- a/src/v-st-ext.adoc +++ b/src/v-st-ext.adoc @@ -1240,8 +1240,9 @@ vector length in `vl` is used as the AVL, and the resulting value is written to `vl`, but not to a destination register. This form can only be used when VLMAX and hence `vl` is not actually changed by the new SEW/LMUL ratio. Use of the instruction with a new SEW/LMUL ratio -that would result in a change of VLMAX is reserved. Implementations -may set `vill` in this case. +that would result in a change of VLMAX is reserved. +Use of the instruction is also reserved if `vill` was 1 beforehand. +Implementations may set `vill` in either case. NOTE: This last form of the instructions allows the `vtype` register to be changed while maintaining the current `vl`, provided VLMAX is not -- cgit v1.1 From c48f54acd85b7599f8da3cebb518380cf017fde2 Mon Sep 17 00:00:00 2001 From: Nick Knight Date: Mon, 18 Mar 2024 23:40:06 -0700 Subject: Clarify slide semantics (#1268) --- src/v-st-ext.adoc | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/src/v-st-ext.adoc b/src/v-st-ext.adoc index 4821be9..63089ad 100644 --- a/src/v-st-ext.adoc +++ b/src/v-st-ext.adoc @@ -4534,13 +4534,13 @@ Destination elements _OFFSET_ through `vl`-1 are written if unmasked and if _OFFSET_ < `vl`. ---- - vslideup behavior for destination elements + vslideup behavior for destination elements (`vstart` < `vl`) OFFSET is amount to slideup, either from x register or a 5-bit immediate - 0 <= i < max(vstart, OFFSET) Unchanged - max(vstart, OFFSET) <= i < vl vd[i] = vs2[i-OFFSET] if v0.mask[i] enabled - vl <= i < VLMAX Follow tail policy + 0 <= i < min(vl, max(vstart, OFFSET)) Unchanged + max(vstart, OFFSET) <= i < vl vd[i] = vs2[i-OFFSET] if v0.mask[i] enabled + vl <= i < VLMAX Follow tail policy ---- The destination vector register group for `vslideup` cannot overlap @@ -4569,12 +4569,12 @@ using an unsigned integer in the `x` register specified by `rs1`, or a If XLEN > SEW, _OFFSET_ is _not_ truncated to SEW bits. ---- - vslidedown behavior for source elements for element i in slide + vslidedown behavior for source elements for element i in slide (`vstart` < `vl`) 0 <= i+OFFSET < VLMAX src[i] = vs2[i+OFFSET] VLMAX <= i+OFFSET src[i] = 0 - vslidedown behavior for destination element i in slide - 0 <= i < vstart Unchanged + vslidedown behavior for destination element i in slide (`vstart` < `vl`) + 0 <= i < vstart Unchanged vstart <= i < vl vd[i] = src[i] if v0.mask[i] enabled vl <= i < VLMAX Follow tail policy -- cgit v1.1 From f662bd20b0a57d4212efec376a0ec4da198c0f4a Mon Sep 17 00:00:00 2001 From: wmat Date: Tue, 19 Mar 2024 14:48:12 -0400 Subject: Fixed a couple of Notes Adding the standard Note formatting. --- src/v-st-ext.adoc | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/v-st-ext.adoc b/src/v-st-ext.adoc index 0d569ec..1fc7279 100644 --- a/src/v-st-ext.adoc +++ b/src/v-st-ext.adoc @@ -5138,10 +5138,13 @@ FP32 and FP64). Vector single-width floating-point reductions (<>) for EEW=32 and EEW=64 are supported as well as widening reductions from FP32 to FP64. -NOTE: As is the case with other RISC-V extensions, it is valid to +[NOTE] +==== +As is the case with other RISC-V extensions, it is valid to include overlapping extensions in the same ISA string. For example, RV64GCV and RV64GCV_Zve64f are both valid and equivalent ISA strings, as is RV64GCV_Zve64f_Zve32x_Zvl128b. +==== ==== Zvfhmin: Vector Extension for Minimal Half-Precision Floating-Point -- cgit v1.1 From a4382e9c8e285360a88d8056c1253e1525552393 Mon Sep 17 00:00:00 2001 From: Andrew Waterman Date: Tue, 19 Mar 2024 15:12:36 -0700 Subject: Smrnmi 0.5 is frozen --- src/rnmi.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/rnmi.adoc b/src/rnmi.adoc index 705fb7d..f505f56 100644 --- a/src/rnmi.adoc +++ b/src/rnmi.adoc @@ -1,9 +1,9 @@ [[rnmi]] -== "Smrnmi" Standard Extension for Resumable Non-Maskable Interrupts, Version 0.4 +== "Smrnmi" Standard Extension for Resumable Non-Maskable Interrupts, Version 0.5 [WARNING] ==== -*Warning! This draft specification may change before being accepted as +*Warning! This frozen specification may change before being accepted as standard by RISC-V International.* ==== -- cgit v1.1 From 77d541ebee986efab68252923dd08398423b067a Mon Sep 17 00:00:00 2001 From: wmat Date: Wed, 20 Mar 2024 09:57:13 -0400 Subject: strlen: Replace PTRLOG with explicit 3 This is a manual application of PR#182 against the Bitmanip repo. --- src/b-st-ext.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/b-st-ext.adoc b/src/b-st-ext.adoc index f7b4230..587db86 100644 --- a/src/b-st-ext.adoc +++ b/src/b-st-ext.adoc @@ -3783,7 +3783,7 @@ strlen: .Lprologue: li a4, SZREG sub a4, a4, a3 // XLEN - offset - slli a3, a3, PTRLOG // offset * 8 + slli a3, a3, 3 // offset * 8 REG_L a2, 0(a1) // chunk /* * Shift the partial/unaligned chunk we loaded to remove the bytes -- cgit v1.1 From 03084395e98f19a09506f9a6cbf06a168cd3b9e3 Mon Sep 17 00:00:00 2001 From: wmat Date: Wed, 20 Mar 2024 10:04:46 -0400 Subject: show opcode as 7-bits like all other insn for unzip and zip #181 This is a manual application of PR#181 from Bitmanip repo. --- src/b-st-ext.adoc | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/src/b-st-ext.adoc b/src/b-st-ext.adoc index 587db86..6f8dd23 100644 --- a/src/b-st-ext.adoc +++ b/src/b-st-ext.adoc @@ -3418,8 +3418,7 @@ Encoding:: [wavedrom, , svg] .... {reg:[ -{bits: 2, name: 0x3}, -{bits: 5, name: 0x4}, +{bits: 7, name: 0x13, attr: ['OP-IMM']}, {bits: 5, name: 'rd'}, {bits: 3, name: 0x5}, {bits: 5, name: 'rs1'}, @@ -3712,8 +3711,7 @@ Encoding:: [wavedrom, , svg] .... {reg:[ -{bits: 2, name: 0x3}, -{bits: 5, name: 0x4}, +{bits: 7, name: 0x13, attr: ['OP-IMM']}, {bits: 5, name: 'rd'}, {bits: 3, name: 0x1}, {bits: 5, name: 'rs1'}, -- cgit v1.1 From 8fdbc3b75ad7bdc726cbfcdd40d6b326b952f43f Mon Sep 17 00:00:00 2001 From: wmat Date: Wed, 20 Mar 2024 10:08:23 -0400 Subject: Fix desc for signextend This is a manual application of PR#172 from the Bitmanip repo. --- src/b-st-ext.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/b-st-ext.adoc b/src/b-st-ext.adoc index 6f8dd23..52beb61 100644 --- a/src/b-st-ext.adoc +++ b/src/b-st-ext.adoc @@ -647,9 +647,9 @@ instructions that return the smaller/larger of two operands. ===== Sign- and zero-extension -These instructions perform the sign-extension or zero-extension of the least significant 8 bits, 16 bits or 32 bits of the source register. +These instructions perform the sign-extension or zero-extension of the least significant 8 bits or 16 bits of the source register. -These instructions replace the generalized idioms `slli rD,rS,(XLEN-) + srli` (for zero-extension) or `slli + srai` (for sign-extension) for the sign-extension of 8-bit and 16-bit quantities, and for the zero-extension of 16-bit and 32-bit quantities. +These instructions replace the generalized idioms `slli rD,rS,(XLEN-) + srli` (for zero-extension) or `slli + srai` (for sign-extension) for the sign-extension of 8-bit and 16-bit quantities, and for the zero-extension of 16-bit quantities. [%header,cols="^1,^1,4,8"] |=== -- cgit v1.1