diff options
author | elisa <elisa@riscv.org> | 2021-09-10 14:07:18 -0700 |
---|---|---|
committer | elisa <elisa@riscv.org> | 2021-09-10 14:07:18 -0700 |
commit | 64353b3717c9d387759c61d16d3984760a028046 (patch) | |
tree | 7dc4b3a1aff37dfd38ff921fa21113a64989a5aa /src/f-st-ext.adoc | |
parent | e1a563505224b47129b0cbb5c46ee22f5e0acddb (diff) | |
download | riscv-isa-manual-64353b3717c9d387759c61d16d3984760a028046.zip riscv-isa-manual-64353b3717c9d387759c61d16d3984760a028046.tar.gz riscv-isa-manual-64353b3717c9d387759c61d16d3984760a028046.tar.bz2 |
adding converted adoc files
Diffstat (limited to 'src/f-st-ext.adoc')
-rw-r--r-- | src/f-st-ext.adoc | 541 |
1 files changed, 541 insertions, 0 deletions
diff --git a/src/f-st-ext.adoc b/src/f-st-ext.adoc new file mode 100644 index 0000000..90239be --- /dev/null +++ b/src/f-st-ext.adoc @@ -0,0 +1,541 @@ +[[single-float]] +== `F` Standard Extension for Single-Precision Floating-Point, Version 2.2 + +This chapter describes the standard instruction-set extension for +single-precision floating-point, which is named `F` and adds +single-precision floating-point computational instructions compliant +with the IEEE 754-2008 arithmetic standard cite:[ieee754-2008]. The F extension depends on +the `Zicsr` extension for control and status register access. + +=== F Register State + +The F extension adds 32 floating-point registers, `f0`–`f31`, each 32 +bits wide, and a floating-point control and status register `fcsr`, +which contains the operating mode and exception status of the +floating-point unit. This additional state is shown in +<<fprs>>. We use the term FLEN to describe the width of +the floating-point registers in the RISC-V ISA, and FLEN=32 for the F +single-precision floating-point extension. Most floating-point +instructions operate on values in the floating-point register file. +Floating-point load and store instructions transfer floating-point +values between registers and memory. Instructions to transfer values to +and from the integer register file are also provided. + +[TIP] +==== +We considered a unified register file for both integer and +floating-point values as this simplifies software register allocation +and calling conventions, and reduces total user state. However, a split +organization increases the total number of registers accessible with a +given instruction width, simplifies provision of enough regfile ports +for wide superscalar issue, supports decoupled floating-point-unit +architectures, and simplifies use of internal floating-point encoding +techniques. Compiler support and calling conventions for split register +file architectures are well understood, and using dirty bits on +floating-point register file state can reduce context-switch overhead. +==== + +[[fprs]] +.RISC-V standard F extension single-precision floating-point state +image::f-standard.png[base,180,1000,align="center"] + +=== Floating-Point Control and Status Register + +The floating-point control and status register, `fcsr`, is a RISC-V +control and status register (CSR). It is a 32-bit read/write register +that selects the dynamic rounding mode for floating-point arithmetic +operations and holds the accrued exception flags, as shown in <<fcsr>>. + + +include::images/wavedrom/float-csr.adoc[] +[[fcsr]] +.Floating-point control and status register +image::image_placeholder.png[] + +The `fcsr` register can be read and written with the FRCSR and FSCSR +instructions, which are assembler pseudoinstructions built on the +underlying CSR access instructions. FRCSR reads `fcsr` by copying it +into integer register _rd_. FSCSR swaps the value in ` fcsr` by copying +the original value into integer register _rd_, and then writing a new +value obtained from integer register _rs1_ into `fcsr`. + +The fields within the `fcsr` can also be accessed individually through +different CSR addresses, and separate assembler pseudoinstructions are +defined for these accesses. The FRRM instruction reads the Rounding Mode +field `frm` and copies it into the least-significant three bits of +integer register _rd_, with zero in all other bits. FSRM swaps the value +in `frm` by copying the original value into integer register _rd_, and +then writing a new value obtained from the three least-significant bits +of integer register _rs1_ into `frm`. FRFLAGS and FSFLAGS are defined +analogously for the Accrued Exception Flags field `fflags`. + + +Bits 31–8 of the `fcsr` are reserved for other standard extensions. If +these extensions are not present, implementations shall ignore writes to +these bits and supply a zero value when read. Standard software should +preserve the contents of these bits. + +Floating-point operations use either a static rounding mode encoded in +the instruction, or a dynamic rounding mode held in `frm`. Rounding +modes are encoded as shown in <<rm>>. A value of 111 in the +instruction’s _rm_ field selects the dynamic rounding mode held in +`frm`. The behavior of floating-point instructions that depend on +rounding mode when executed with a reserved rounding mode is _reserved_, +including both static reserved rounding modes (101–110) and dynamic +reserved rounding modes (101–111). Some instructions, including widening +conversions, have the _rm_ field but are nevertheless mathematically +unaffected by the rounding mode; software should set their _rm_ field to +RNE (000) but implementations must treat the _rm_ field as usual (in +particular, with regard to decoding legal vs. reserved encodings). + +[[rm]] +.Rounding mode encoding. +[cols="^,^,<",options="header",] +|=== +|Rounding Mode |Mnemonic |Meaning +|000 |RNE |Round to Nearest, ties to Even +|001 |RTZ |Round towards Zero +|010 |RDN |Round Down (towards latexmath:[$-\infty$]) +|011 |RUP |Round Up (towards latexmath:[$+\infty$]) +|100 |RMM |Round to Nearest, ties to Max Magnitude +|101 | |_Reserved for future use._ +|110 | |_Reserved for future use._ +|111 |DYN |In instruction’s _rm_ field, selects dynamic rounding mode; +| | |In Rounding Mode register, _reserved_. +|=== + + +[NOTE] +==== +The C99 language standard effectively mandates the provision of a +dynamic rounding mode register. In typical implementations, writes to +the dynamic rounding mode CSR state will serialize the pipeline. Static +rounding modes are used to implement specialized arithmetic operations +that often have to switch frequently between different rounding modes. + +The ratified version of the F spec mandated that an illegal instruction +exception was raised when an instruction was executed with a reserved +dynamic rounding mode. This has been weakened to reserved, which matches +the behavior of static rounding-mode instructions. Raising an illegal +instruction exception is still valid behavior when encountering a +reserved encoding, so implementations compatible with the ratified spec +are compatible with the weakened spec. +==== + + +The accrued exception flags indicate the exception conditions that have +arisen on any floating-point arithmetic instruction since the field was +last reset by software, as shown in <<bitdef>>. The base +RISC-V ISA does not support generating a trap on the setting of a +floating-point exception flag. +(((floating-point, excpetion flag))) + +[[bitdef]] +.Accrued exception flag encoding. +[cols="^,<",options="header",] +|=== +|Flag Mnemonic |Flag Meaning +|NV |Invalid Operation +|DZ |Divide by Zero +|OF |Overflow +|UF |Underflow +|NX |Inexact +|=== + +[NOTE] +==== +As allowed by the standard, we do not support traps on floating-point +exceptions in the F extension, but instead require explicit checks of +the flags in software. We considered adding branches controlled directly +by the contents of the floating-point accrued exception flags, but +ultimately chose to omit these instructions to keep the ISA simple. +==== + +=== NaN Generation and Propagation +(((NaN, generation))) +(((NaN, propagation))) + +Except when otherwise stated, if the result of a floating-point +operation is NaN, it is the canonical NaN. The canonical NaN has a +positive sign and all significand bits clear except the MSB, a.k.a. the +quiet bit. For single-precision floating-point, this corresponds to the +pattern `0x7fc00000`. + +[TIP] +==== +We considered propagating NaN payloads, as is recommended by the +standard, but this decision would have increased hardware cost. +Moreover, since this feature is optional in the standard, it cannot be +used in portable code. + +Implementors are free to provide a NaN payload propagation scheme as a +nonstandard extension enabled by a nonstandard operating mode. However, +the canonical NaN scheme described above must always be supported and +should be the default mode. +==== + +[NOTE] +==== +We require implementations to return the standard-mandated default +values in the case of exceptional conditions, without any further +intervention on the part of user-level software (unlike the Alpha ISA +floating-point trap barriers). We believe full hardware handling of +exceptional cases will become more common, and so wish to avoid +complicating the user-level ISA to optimize other approaches. +Implementations can always trap to machine-mode software handlers to +provide exceptional default values. +==== + +=== Subnormal Arithmetic +(((operations, subnormal))) + +Operations on subnormal numbers are handled in accordance with the IEEE +754-2008 standard. + +In the parlance of the IEEE standard, tininess is detected after +rounding. +(((tininess, handling))) + +[NOTE] +==== +Detecting tininess after rounding results in fewer spurious underflow +signals. +==== + +=== Single-Precision Load and Store Instructions + +Floating-point loads and stores use the same base+offset addressing mode +as the integer base ISAs, with a base address in register _rs1_ and a +12-bit signed byte offset. The FLW instruction loads a single-precision +floating-point value from memory into floating-point register _rd_. FSW +stores a single-precision value from floating-point register _rs2_ to +memory. + +include::images/wavedrom/sp-load-store.adoc[] +[[sp-ldst]] +.SP load and store +image::image_placeholder.png[] + +FLW and FSW are only guaranteed to execute atomically if the effective +address is naturally aligned. + +FLW and FSW do not modify the bits being transferred; in particular, the +payloads of non-canonical NaNs are preserved. + +As described in <<sp-ldst>>, the execution +environment defines whether misaligned floating-point loads and stores +are handled invisibly or raise a contained or fatal trap. + +[[single-float-compute]] +=== Single-Precision Floating-Point Computational Instructions + +Floating-point arithmetic instructions with one or two source operands +use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S +perform single-precision floating-point addition and multiplication +respectively, between _rs1_ and _rs2_. FSUB.S performs the +single-precision floating-point subtraction of _rs2_ from _rs1_. FDIV.S +performs the single-precision floating-point division of _rs1_ by _rs2_. +FSQRT.S computes the square root of _rs1_. In each case, the result is +written to _rd_. + +The 2-bit floating-point format field _fmt_ is encoded as shown in +<<fmt>>. It is set to _S_ (00) for all instructions in the F +extension. + +[[fmt]] +.Format field encoding +[cols="^,^,<",options="header",] +|=== +|_fmt_ field |Mnemonic |Meaning +|00 |S |32-bit single-precision +|01 |D |64-bit double-precision +|10 |H |16-bit half-precision +|11 |Q |128-bit quad-precision +|=== + +All floating-point operations that perform rounding can select the +rounding mode using the _rm_ field with the encoding shown in +<<rm>>. + +Floating-point minimum-number and maximum-number instructions FMIN.S and +FMAX.S write, respectively, the smaller or larger of _rs1_ and _rs2_ to +_rd_. For the purposes of these instructions only, the value +latexmath:[$-0.0$] is considered to be less than the value +latexmath:[$+0.0$]. If both inputs are NaNs, the result is the canonical +NaN. If only one operand is a NaN, the result is the non-NaN operand. +Signaling NaN inputs set the invalid operation exception flag, even when +the result is not NaN. + +[NOTE] +==== +Note that in version 2.2 of the F extension, the FMIN.S and FMAX.S +instructions were amended to implement the proposed IEEE 754-201x +minimumNumber and maximumNumber operations, rather than the IEEE +754-2008 minNum and maxNum operations. These operations differ in their +handling of signaling NaNs. +==== + +include::images/wavedrom/spfloat.adoc[] +[[spfloat]] +.Single-Precision Floating-Point Computational Instructions +image::image_placeholder.png[] +(((floating point, fused multiply-add))) + +Floating-point fused multiply-add instructions require a new standard +instruction format. R4-type instructions specify three source registers +(_rs1_, _rs2_, and _rs3_) and a destination register (_rd_). This format +is only used by the floating-point fused multiply-add instructions. + +FMADD.S multiplies the values in _rs1_ and _rs2_, adds the value in +_rs3_, and writes the final result to _rd_. FMADD.S computes +_(rs1latexmath:[$\times$]rs2)+rs3_. + +FMSUB.S multiplies the values in _rs1_ and _rs2_, subtracts the value in +_rs3_, and writes the final result to _rd_. FMSUB.S computes +_(rs1latexmath:[$\times$]rs2)-rs3_. + +FNMSUB.S multiplies the values in _rs1_ and _rs2_, negates the product, +adds the value in _rs3_, and writes the final result to _rd_. FNMSUB.S +computes _-(rs1latexmath:[$\times$]rs2)+rs3_. + +FNMADD.S multiplies the values in _rs1_ and _rs2_, negates the product, +subtracts the value in _rs3_, and writes the final result to _rd_. +FNMADD.S computes _-(rs1latexmath:[$\times$]rs2)-rs3_. + +[NOTE] +==== +The FNMSUB and FNMADD instructions are counterintuitively named, owing +to the naming of the corresponding instructions in MIPS-IV. The MIPS +instructions were defined to negate the sum, rather than negating the +product as the RISC-V instructions do, so the naming scheme was more +rational at the time. The two definitions differ with respect to +signed-zero results. The RISC-V definition matches the behavior of the +x86 and ARM fused multiply-add instructions, but unfortunately the +RISC-V FNMSUB and FNMADD instruction names are swapped compared to x86 +and ARM. +==== + +include::images/wavedrom/fnmaddsub.adoc[] +[[fnmaddsub]] +.F[N]MADD/F[N]MSUB instructions +image::image_placeholder.png[] + +[NOTE] +==== +The fused multiply-add (FMA) instructions consume a large part of the +32-bit instruction encoding space. Some alternatives considered were to +restrict FMA to only use dynamic rounding modes, but static rounding +modes are useful in code that exploits the lack of product rounding. +Another alternative would have been to use rd to provide rs3, but this +would require additional move instructions in some common sequences. The +current design still leaves a large portion of the 32-bit encoding space +open while avoiding having FMA be non-orthogonal. +==== + +The fused multiply-add instructions must set the invalid operation +exception flag when the multiplicands are latexmath:[$\infty$] and zero, +even when the addend is a quiet NaN. + +[NOTE] +==== +The IEEE 754-2008 standard permits, but does not require, raising the +invalid exception for the operation +latexmath:[$\infty\times 0\ +$]qNaN. +==== + +=== Single-Precision Floating-Point Conversion and Move Instructions + +Floating-point-to-integer and integer-to-floating-point conversion +instructions are encoded in the OP-FP major opcode space. FCVT.W.S or +FCVT.L.S converts a floating-point number in floating-point register +_rs1_ to a signed 32-bit or 64-bit integer, respectively, in integer +register _rd_. FCVT.S.W or FCVT.S.L converts a 32-bit or 64-bit signed +integer, respectively, in integer register _rs1_ into a floating-point +number in floating-point register _rd_. FCVT.WU.S, FCVT.LU.S, FCVT.S.WU, +and FCVT.S.LU variants convert to or from unsigned integer values. For +XLENlatexmath:[$>32$], FCVT.W[U].S sign-extends the 32-bit result to the +destination register width. FCVT.L[U].S and FCVT.S.L[U] are RV64-only +instructions. If the rounded result is not representable in the +destination format, it is clipped to the nearest value and the invalid +flag is set. <<int_conv>> gives the range of valid inputs +for FCVT._int_.S and the behavior for invalid inputs. +(((floating-point, conversion))) + +[[int_conv]] +.Domains of float-to-integer conversions and behavior for invalid inputs +[cols="<,>,>,>,>",options="header",] +|=== +| |FCVT.W.S |FCVT.WU.S |FCVT.L.S |FCVT.LU.S +|Minimum valid input (after rounding) |latexmath:[$-2^{31}$] |0 +|latexmath:[$-2^{63}$] |0 + +|Maximum valid input (after rounding) |latexmath:[$2^{31}-1$] +|latexmath:[$2^{32}-1$] |latexmath:[$2^{63}-1$] |latexmath:[$2^{64}-1$] + +|Output for out-of-range negative input |latexmath:[$-2^{31}$] |0 +|latexmath:[$-2^{63}$] |0 + +|Output for latexmath:[$-\infty$] |latexmath:[$-2^{31}$] |0 +|latexmath:[$-2^{63}$] |0 + +|Output for out-of-range positive input |latexmath:[$2^{31}-1$] +|latexmath:[$2^{32}-1$] |latexmath:[$2^{63}-1$] |latexmath:[$2^{64}-1$] + +|Output for latexmath:[$+\infty$] or NaN |latexmath:[$2^{31}-1$] +|latexmath:[$2^{32}-1$] |latexmath:[$2^{63}-1$] |latexmath:[$2^{64}-1$] +|=== + +All floating-point to integer and integer to floating-point conversion +instructions round according to the _rm_ field. A floating-point +register can be initialized to floating-point positive zero using +FCVT.S.W _rd_, `x0`, which will never set any exception flags. + +All floating-point conversion instructions set the Inexact exception +flag if the rounded result differs from the operand value and the +Invalid exception flag is not set. + +include::images/wavedrom/spfloat.adoc[] +[[fcvt]] +.SP float convert and move +image::image_placeholder.png[] + +Floating-point to floating-point sign-injection instructions, FSGNJ.S, +FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the +sign bit from _rs1_. For FSGNJ, the result’s sign bit is _rs2_’s sign +bit; for FSGNJN, the result’s sign bit is the opposite of _rs2_’s sign +bit; and for FSGNJX, the sign bit is the XOR of the sign bits of _rs1_ +and _rs2_. Sign-injection instructions do not set floating-point +exception flags, nor do they canonicalize NaNs. Note, FSGNJ.S _rx, ry, +ry_ moves _ry_ to _rx_ (assembler pseudoinstruction FMV.S _rx, ry_); +FSGNJN.S _rx, ry, ry_ moves the negation of _ry_ to _rx_ (assembler +pseudoinstruction FNEG.S _rx, ry_); and FSGNJX.S _rx, ry, ry_ moves the +absolute value of _ry_ to _rx_ (assembler pseudoinstruction FABS.S _rx, +ry_). + +include::images/wavedrom/spfloat-cn-cmp.adoc[] +[[spfloat-cn-cmp]] +.SP floating point convert and compare +image::image_placeholder.png[] + +[NOTE] +==== +The sign-injection instructions provide floating-point MV, ABS, and NEG, +as well as supporting a few other operations, including the IEEE +copySign operation and sign manipulation in transcendental math function +libraries. Although MV, ABS, and NEG only need a single register +operand, whereas FSGNJ instructions need two, it is unlikely most +microarchitectures would add optimizations to benefit from the reduced +number of register reads for these relatively infrequent instructions. +Even in this case, a microarchitecture can simply detect when both +source registers are the same for FSGNJ instructions and only read a +single copy. +==== + +Instructions are provided to move bit patterns between the +floating-point and integer registers. FMV.X.W moves the single-precision +value in floating-point register _rs1_ represented in IEEE 754-2008 +encoding to the lower 32 bits of integer register _rd_. The bits are not +modified in the transfer, and in particular, the payloads of +non-canonical NaNs are preserved. For RV64, the higher 32 bits of the +destination register are filled with copies of the floating-point +number’s sign bit. + +FMV.W.X moves the single-precision value encoded in IEEE 754-2008 +standard encoding from the lower 32 bits of integer register _rs1_ to +the floating-point register _rd_. The bits are not modified in the +transfer, and in particular, the payloads of non-canonical NaNs are +preserved. + +[NOTE] +==== +The FMV.W.X and FMV.X.W instructions were previously called FMV.S.X and +FMV.X.S. The use of W is more consistent with their semantics as an +instruction that moves 32 bits without interpreting them. This became +clearer after defining NaN-boxing. To avoid disturbing existing code, +both the W and S versions will be supported by tools. +==== + +include::images/wavedrom/spfloat-mv.adoc[] +[[spfloat-mv]] +.SP floating point move +image::image_placeholder.png[] + + +[TIP] +==== +The base floating-point ISA was defined so as to allow implementations +to employ an internal recoding of the floating-point format in registers +to simplify handling of subnormal values and possibly to reduce +functional unit latency. To this end, the F extension avoids +representing integer values in the floating-point registers by defining +conversion and comparison operations that read and write the integer +register file directly. This also removes many of the common cases where +explicit moves between integer and floating-point registers are +required, reducing instruction count and critical paths for common +mixed-format code sequences. +==== + +=== Single-Precision Floating-Point Compare Instructions + +Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the +specified comparison between floating-point registers +(latexmath:[$\mbox{\em rs1} += \mbox{\em rs2}$], latexmath:[$\mbox{\em rs1} < \mbox{\em rs2}$], +latexmath:[$\mbox{\em rs1} \leq +\mbox{\em rs2}$]) writing 1 to the integer register _rd_ if the +condition holds, and 0 otherwise. + +FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as +_signaling_ comparisons: that is, they set the invalid operation +exception flag if either input is NaN. FEQ.S performs a _quiet_ +comparison: it only sets the invalid operation exception flag if either +input is a signaling NaN. For all three instructions, the result is 0 if +either operand is NaN. + +include::images/wavedrom/spfloat-comp.adoc[] +[[spfloat-comp]] +.SP floating point compare +image::image_placeholder.png[] + +[NOTE] +==== +The F extension provides a latexmath:[$\leq$] comparison, whereas the +base ISAs provide a latexmath:[$\geq$] branch comparison. Because +latexmath:[$\leq$] can be synthesized from latexmath:[$\geq$] and +vice-versa, there is no performance implication to this inconsistency, +but it is nevertheless an unfortunate incongruity in the ISA. +==== + +=== Single-Precision Floating-Point Classify Instruction + +The FCLASS.S instruction examines the value in floating-point register +_rs1_ and writes to integer register _rd_ a 10-bit mask that indicates +the class of the floating-point number. The format of the mask is +described in <<fclass>>. The corresponding bit in _rd_ will +be set if the property is true and clear otherwise. All other bits in +_rd_ are cleared. Note that exactly one bit in _rd_ will be set. +FCLASS.S does not set the floating-point exception flags. +(((floating-point, classification))) + +include::images/wavedrom/spfloat-classify.adoc[] +[[spfloat-classify]] +.SP floating point classify +image::image_placeholder.png[] + +[[fclass]] +.Format of result of FCLASS instruction. +[cols="^,<",options="header",] +|=== +|_rd_ bit |Meaning +|0 |_rs1_ is latexmath:[$-\infty$]. +|1 |_rs1_ is a negative normal number. +|2 |_rs1_ is a negative subnormal number. +|3 |_rs1_ is latexmath:[$-0$]. +|4 |_rs1_ is latexmath:[$+0$]. +|5 |_rs1_ is a positive subnormal number. +|6 |_rs1_ is a positive normal number. +|7 |_rs1_ is latexmath:[$+\infty$]. +|8 |_rs1_ is a signaling NaN. +|9 |_rs1_ is a quiet NaN. +|=== + |