aboutsummaryrefslogtreecommitdiff
path: root/src/f-st-ext.adoc
diff options
context:
space:
mode:
Diffstat (limited to 'src/f-st-ext.adoc')
-rw-r--r--src/f-st-ext.adoc541
1 files changed, 541 insertions, 0 deletions
diff --git a/src/f-st-ext.adoc b/src/f-st-ext.adoc
new file mode 100644
index 0000000..90239be
--- /dev/null
+++ b/src/f-st-ext.adoc
@@ -0,0 +1,541 @@
+[[single-float]]
+== `F` Standard Extension for Single-Precision Floating-Point, Version 2.2
+
+This chapter describes the standard instruction-set extension for
+single-precision floating-point, which is named `F` and adds
+single-precision floating-point computational instructions compliant
+with the IEEE 754-2008 arithmetic standard cite:[ieee754-2008]. The F extension depends on
+the `Zicsr` extension for control and status register access.
+
+=== F Register State
+
+The F extension adds 32 floating-point registers, `f0`–`f31`, each 32
+bits wide, and a floating-point control and status register `fcsr`,
+which contains the operating mode and exception status of the
+floating-point unit. This additional state is shown in
+<<fprs>>. We use the term FLEN to describe the width of
+the floating-point registers in the RISC-V ISA, and FLEN=32 for the F
+single-precision floating-point extension. Most floating-point
+instructions operate on values in the floating-point register file.
+Floating-point load and store instructions transfer floating-point
+values between registers and memory. Instructions to transfer values to
+and from the integer register file are also provided.
+
+[TIP]
+====
+We considered a unified register file for both integer and
+floating-point values as this simplifies software register allocation
+and calling conventions, and reduces total user state. However, a split
+organization increases the total number of registers accessible with a
+given instruction width, simplifies provision of enough regfile ports
+for wide superscalar issue, supports decoupled floating-point-unit
+architectures, and simplifies use of internal floating-point encoding
+techniques. Compiler support and calling conventions for split register
+file architectures are well understood, and using dirty bits on
+floating-point register file state can reduce context-switch overhead.
+====
+
+[[fprs]]
+.RISC-V standard F extension single-precision floating-point state
+image::f-standard.png[base,180,1000,align="center"]
+
+=== Floating-Point Control and Status Register
+
+The floating-point control and status register, `fcsr`, is a RISC-V
+control and status register (CSR). It is a 32-bit read/write register
+that selects the dynamic rounding mode for floating-point arithmetic
+operations and holds the accrued exception flags, as shown in <<fcsr>>.
+
+
+include::images/wavedrom/float-csr.adoc[]
+[[fcsr]]
+.Floating-point control and status register
+image::image_placeholder.png[]
+
+The `fcsr` register can be read and written with the FRCSR and FSCSR
+instructions, which are assembler pseudoinstructions built on the
+underlying CSR access instructions. FRCSR reads `fcsr` by copying it
+into integer register _rd_. FSCSR swaps the value in ` fcsr` by copying
+the original value into integer register _rd_, and then writing a new
+value obtained from integer register _rs1_ into `fcsr`.
+
+The fields within the `fcsr` can also be accessed individually through
+different CSR addresses, and separate assembler pseudoinstructions are
+defined for these accesses. The FRRM instruction reads the Rounding Mode
+field `frm` and copies it into the least-significant three bits of
+integer register _rd_, with zero in all other bits. FSRM swaps the value
+in `frm` by copying the original value into integer register _rd_, and
+then writing a new value obtained from the three least-significant bits
+of integer register _rs1_ into `frm`. FRFLAGS and FSFLAGS are defined
+analogously for the Accrued Exception Flags field `fflags`.
+
+
+Bits 31–8 of the `fcsr` are reserved for other standard extensions. If
+these extensions are not present, implementations shall ignore writes to
+these bits and supply a zero value when read. Standard software should
+preserve the contents of these bits.
+
+Floating-point operations use either a static rounding mode encoded in
+the instruction, or a dynamic rounding mode held in `frm`. Rounding
+modes are encoded as shown in <<rm>>. A value of 111 in the
+instruction’s _rm_ field selects the dynamic rounding mode held in
+`frm`. The behavior of floating-point instructions that depend on
+rounding mode when executed with a reserved rounding mode is _reserved_,
+including both static reserved rounding modes (101–110) and dynamic
+reserved rounding modes (101–111). Some instructions, including widening
+conversions, have the _rm_ field but are nevertheless mathematically
+unaffected by the rounding mode; software should set their _rm_ field to
+RNE (000) but implementations must treat the _rm_ field as usual (in
+particular, with regard to decoding legal vs. reserved encodings).
+
+[[rm]]
+.Rounding mode encoding.
+[cols="^,^,<",options="header",]
+|===
+|Rounding Mode |Mnemonic |Meaning
+|000 |RNE |Round to Nearest, ties to Even
+|001 |RTZ |Round towards Zero
+|010 |RDN |Round Down (towards latexmath:[$-\infty$])
+|011 |RUP |Round Up (towards latexmath:[$+\infty$])
+|100 |RMM |Round to Nearest, ties to Max Magnitude
+|101 | |_Reserved for future use._
+|110 | |_Reserved for future use._
+|111 |DYN |In instruction’s _rm_ field, selects dynamic rounding mode;
+| | |In Rounding Mode register, _reserved_.
+|===
+
+
+[NOTE]
+====
+The C99 language standard effectively mandates the provision of a
+dynamic rounding mode register. In typical implementations, writes to
+the dynamic rounding mode CSR state will serialize the pipeline. Static
+rounding modes are used to implement specialized arithmetic operations
+that often have to switch frequently between different rounding modes.
+
+The ratified version of the F spec mandated that an illegal instruction
+exception was raised when an instruction was executed with a reserved
+dynamic rounding mode. This has been weakened to reserved, which matches
+the behavior of static rounding-mode instructions. Raising an illegal
+instruction exception is still valid behavior when encountering a
+reserved encoding, so implementations compatible with the ratified spec
+are compatible with the weakened spec.
+====
+
+
+The accrued exception flags indicate the exception conditions that have
+arisen on any floating-point arithmetic instruction since the field was
+last reset by software, as shown in <<bitdef>>. The base
+RISC-V ISA does not support generating a trap on the setting of a
+floating-point exception flag.
+(((floating-point, excpetion flag)))
+
+[[bitdef]]
+.Accrued exception flag encoding.
+[cols="^,<",options="header",]
+|===
+|Flag Mnemonic |Flag Meaning
+|NV |Invalid Operation
+|DZ |Divide by Zero
+|OF |Overflow
+|UF |Underflow
+|NX |Inexact
+|===
+
+[NOTE]
+====
+As allowed by the standard, we do not support traps on floating-point
+exceptions in the F extension, but instead require explicit checks of
+the flags in software. We considered adding branches controlled directly
+by the contents of the floating-point accrued exception flags, but
+ultimately chose to omit these instructions to keep the ISA simple.
+====
+
+=== NaN Generation and Propagation
+(((NaN, generation)))
+(((NaN, propagation)))
+
+Except when otherwise stated, if the result of a floating-point
+operation is NaN, it is the canonical NaN. The canonical NaN has a
+positive sign and all significand bits clear except the MSB, a.k.a. the
+quiet bit. For single-precision floating-point, this corresponds to the
+pattern `0x7fc00000`.
+
+[TIP]
+====
+We considered propagating NaN payloads, as is recommended by the
+standard, but this decision would have increased hardware cost.
+Moreover, since this feature is optional in the standard, it cannot be
+used in portable code.
+
+Implementors are free to provide a NaN payload propagation scheme as a
+nonstandard extension enabled by a nonstandard operating mode. However,
+the canonical NaN scheme described above must always be supported and
+should be the default mode.
+====
+
+[NOTE]
+====
+We require implementations to return the standard-mandated default
+values in the case of exceptional conditions, without any further
+intervention on the part of user-level software (unlike the Alpha ISA
+floating-point trap barriers). We believe full hardware handling of
+exceptional cases will become more common, and so wish to avoid
+complicating the user-level ISA to optimize other approaches.
+Implementations can always trap to machine-mode software handlers to
+provide exceptional default values.
+====
+
+=== Subnormal Arithmetic
+(((operations, subnormal)))
+
+Operations on subnormal numbers are handled in accordance with the IEEE
+754-2008 standard.
+
+In the parlance of the IEEE standard, tininess is detected after
+rounding.
+(((tininess, handling)))
+
+[NOTE]
+====
+Detecting tininess after rounding results in fewer spurious underflow
+signals.
+====
+
+=== Single-Precision Load and Store Instructions
+
+Floating-point loads and stores use the same base+offset addressing mode
+as the integer base ISAs, with a base address in register _rs1_ and a
+12-bit signed byte offset. The FLW instruction loads a single-precision
+floating-point value from memory into floating-point register _rd_. FSW
+stores a single-precision value from floating-point register _rs2_ to
+memory.
+
+include::images/wavedrom/sp-load-store.adoc[]
+[[sp-ldst]]
+.SP load and store
+image::image_placeholder.png[]
+
+FLW and FSW are only guaranteed to execute atomically if the effective
+address is naturally aligned.
+
+FLW and FSW do not modify the bits being transferred; in particular, the
+payloads of non-canonical NaNs are preserved.
+
+As described in <<sp-ldst>>, the execution
+environment defines whether misaligned floating-point loads and stores
+are handled invisibly or raise a contained or fatal trap.
+
+[[single-float-compute]]
+=== Single-Precision Floating-Point Computational Instructions
+
+Floating-point arithmetic instructions with one or two source operands
+use the R-type format with the OP-FP major opcode. FADD.S and FMUL.S
+perform single-precision floating-point addition and multiplication
+respectively, between _rs1_ and _rs2_. FSUB.S performs the
+single-precision floating-point subtraction of _rs2_ from _rs1_. FDIV.S
+performs the single-precision floating-point division of _rs1_ by _rs2_.
+FSQRT.S computes the square root of _rs1_. In each case, the result is
+written to _rd_.
+
+The 2-bit floating-point format field _fmt_ is encoded as shown in
+<<fmt>>. It is set to _S_ (00) for all instructions in the F
+extension.
+
+[[fmt]]
+.Format field encoding
+[cols="^,^,<",options="header",]
+|===
+|_fmt_ field |Mnemonic |Meaning
+|00 |S |32-bit single-precision
+|01 |D |64-bit double-precision
+|10 |H |16-bit half-precision
+|11 |Q |128-bit quad-precision
+|===
+
+All floating-point operations that perform rounding can select the
+rounding mode using the _rm_ field with the encoding shown in
+<<rm>>.
+
+Floating-point minimum-number and maximum-number instructions FMIN.S and
+FMAX.S write, respectively, the smaller or larger of _rs1_ and _rs2_ to
+_rd_. For the purposes of these instructions only, the value
+latexmath:[$-0.0$] is considered to be less than the value
+latexmath:[$+0.0$]. If both inputs are NaNs, the result is the canonical
+NaN. If only one operand is a NaN, the result is the non-NaN operand.
+Signaling NaN inputs set the invalid operation exception flag, even when
+the result is not NaN.
+
+[NOTE]
+====
+Note that in version 2.2 of the F extension, the FMIN.S and FMAX.S
+instructions were amended to implement the proposed IEEE 754-201x
+minimumNumber and maximumNumber operations, rather than the IEEE
+754-2008 minNum and maxNum operations. These operations differ in their
+handling of signaling NaNs.
+====
+
+include::images/wavedrom/spfloat.adoc[]
+[[spfloat]]
+.Single-Precision Floating-Point Computational Instructions
+image::image_placeholder.png[]
+(((floating point, fused multiply-add)))
+
+Floating-point fused multiply-add instructions require a new standard
+instruction format. R4-type instructions specify three source registers
+(_rs1_, _rs2_, and _rs3_) and a destination register (_rd_). This format
+is only used by the floating-point fused multiply-add instructions.
+
+FMADD.S multiplies the values in _rs1_ and _rs2_, adds the value in
+_rs3_, and writes the final result to _rd_. FMADD.S computes
+_(rs1latexmath:[$\times$]rs2)+rs3_.
+
+FMSUB.S multiplies the values in _rs1_ and _rs2_, subtracts the value in
+_rs3_, and writes the final result to _rd_. FMSUB.S computes
+_(rs1latexmath:[$\times$]rs2)-rs3_.
+
+FNMSUB.S multiplies the values in _rs1_ and _rs2_, negates the product,
+adds the value in _rs3_, and writes the final result to _rd_. FNMSUB.S
+computes _-(rs1latexmath:[$\times$]rs2)+rs3_.
+
+FNMADD.S multiplies the values in _rs1_ and _rs2_, negates the product,
+subtracts the value in _rs3_, and writes the final result to _rd_.
+FNMADD.S computes _-(rs1latexmath:[$\times$]rs2)-rs3_.
+
+[NOTE]
+====
+The FNMSUB and FNMADD instructions are counterintuitively named, owing
+to the naming of the corresponding instructions in MIPS-IV. The MIPS
+instructions were defined to negate the sum, rather than negating the
+product as the RISC-V instructions do, so the naming scheme was more
+rational at the time. The two definitions differ with respect to
+signed-zero results. The RISC-V definition matches the behavior of the
+x86 and ARM fused multiply-add instructions, but unfortunately the
+RISC-V FNMSUB and FNMADD instruction names are swapped compared to x86
+and ARM.
+====
+
+include::images/wavedrom/fnmaddsub.adoc[]
+[[fnmaddsub]]
+.F[N]MADD/F[N]MSUB instructions
+image::image_placeholder.png[]
+
+[NOTE]
+====
+The fused multiply-add (FMA) instructions consume a large part of the
+32-bit instruction encoding space. Some alternatives considered were to
+restrict FMA to only use dynamic rounding modes, but static rounding
+modes are useful in code that exploits the lack of product rounding.
+Another alternative would have been to use rd to provide rs3, but this
+would require additional move instructions in some common sequences. The
+current design still leaves a large portion of the 32-bit encoding space
+open while avoiding having FMA be non-orthogonal.
+====
+
+The fused multiply-add instructions must set the invalid operation
+exception flag when the multiplicands are latexmath:[$\infty$] and zero,
+even when the addend is a quiet NaN.
+
+[NOTE]
+====
+The IEEE 754-2008 standard permits, but does not require, raising the
+invalid exception for the operation
+latexmath:[$\infty\times 0\ +$]qNaN.
+====
+
+=== Single-Precision Floating-Point Conversion and Move Instructions
+
+Floating-point-to-integer and integer-to-floating-point conversion
+instructions are encoded in the OP-FP major opcode space. FCVT.W.S or
+FCVT.L.S converts a floating-point number in floating-point register
+_rs1_ to a signed 32-bit or 64-bit integer, respectively, in integer
+register _rd_. FCVT.S.W or FCVT.S.L converts a 32-bit or 64-bit signed
+integer, respectively, in integer register _rs1_ into a floating-point
+number in floating-point register _rd_. FCVT.WU.S, FCVT.LU.S, FCVT.S.WU,
+and FCVT.S.LU variants convert to or from unsigned integer values. For
+XLENlatexmath:[$>32$], FCVT.W[U].S sign-extends the 32-bit result to the
+destination register width. FCVT.L[U].S and FCVT.S.L[U] are RV64-only
+instructions. If the rounded result is not representable in the
+destination format, it is clipped to the nearest value and the invalid
+flag is set. <<int_conv>> gives the range of valid inputs
+for FCVT._int_.S and the behavior for invalid inputs.
+(((floating-point, conversion)))
+
+[[int_conv]]
+.Domains of float-to-integer conversions and behavior for invalid inputs
+[cols="<,>,>,>,>",options="header",]
+|===
+| |FCVT.W.S |FCVT.WU.S |FCVT.L.S |FCVT.LU.S
+|Minimum valid input (after rounding) |latexmath:[$-2^{31}$] |0
+|latexmath:[$-2^{63}$] |0
+
+|Maximum valid input (after rounding) |latexmath:[$2^{31}-1$]
+|latexmath:[$2^{32}-1$] |latexmath:[$2^{63}-1$] |latexmath:[$2^{64}-1$]
+
+|Output for out-of-range negative input |latexmath:[$-2^{31}$] |0
+|latexmath:[$-2^{63}$] |0
+
+|Output for latexmath:[$-\infty$] |latexmath:[$-2^{31}$] |0
+|latexmath:[$-2^{63}$] |0
+
+|Output for out-of-range positive input |latexmath:[$2^{31}-1$]
+|latexmath:[$2^{32}-1$] |latexmath:[$2^{63}-1$] |latexmath:[$2^{64}-1$]
+
+|Output for latexmath:[$+\infty$] or NaN |latexmath:[$2^{31}-1$]
+|latexmath:[$2^{32}-1$] |latexmath:[$2^{63}-1$] |latexmath:[$2^{64}-1$]
+|===
+
+All floating-point to integer and integer to floating-point conversion
+instructions round according to the _rm_ field. A floating-point
+register can be initialized to floating-point positive zero using
+FCVT.S.W _rd_, `x0`, which will never set any exception flags.
+
+All floating-point conversion instructions set the Inexact exception
+flag if the rounded result differs from the operand value and the
+Invalid exception flag is not set.
+
+include::images/wavedrom/spfloat.adoc[]
+[[fcvt]]
+.SP float convert and move
+image::image_placeholder.png[]
+
+Floating-point to floating-point sign-injection instructions, FSGNJ.S,
+FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the
+sign bit from _rs1_. For FSGNJ, the result’s sign bit is _rs2_’s sign
+bit; for FSGNJN, the result’s sign bit is the opposite of _rs2_’s sign
+bit; and for FSGNJX, the sign bit is the XOR of the sign bits of _rs1_
+and _rs2_. Sign-injection instructions do not set floating-point
+exception flags, nor do they canonicalize NaNs. Note, FSGNJ.S _rx, ry,
+ry_ moves _ry_ to _rx_ (assembler pseudoinstruction FMV.S _rx, ry_);
+FSGNJN.S _rx, ry, ry_ moves the negation of _ry_ to _rx_ (assembler
+pseudoinstruction FNEG.S _rx, ry_); and FSGNJX.S _rx, ry, ry_ moves the
+absolute value of _ry_ to _rx_ (assembler pseudoinstruction FABS.S _rx,
+ry_).
+
+include::images/wavedrom/spfloat-cn-cmp.adoc[]
+[[spfloat-cn-cmp]]
+.SP floating point convert and compare
+image::image_placeholder.png[]
+
+[NOTE]
+====
+The sign-injection instructions provide floating-point MV, ABS, and NEG,
+as well as supporting a few other operations, including the IEEE
+copySign operation and sign manipulation in transcendental math function
+libraries. Although MV, ABS, and NEG only need a single register
+operand, whereas FSGNJ instructions need two, it is unlikely most
+microarchitectures would add optimizations to benefit from the reduced
+number of register reads for these relatively infrequent instructions.
+Even in this case, a microarchitecture can simply detect when both
+source registers are the same for FSGNJ instructions and only read a
+single copy.
+====
+
+Instructions are provided to move bit patterns between the
+floating-point and integer registers. FMV.X.W moves the single-precision
+value in floating-point register _rs1_ represented in IEEE 754-2008
+encoding to the lower 32 bits of integer register _rd_. The bits are not
+modified in the transfer, and in particular, the payloads of
+non-canonical NaNs are preserved. For RV64, the higher 32 bits of the
+destination register are filled with copies of the floating-point
+number’s sign bit.
+
+FMV.W.X moves the single-precision value encoded in IEEE 754-2008
+standard encoding from the lower 32 bits of integer register _rs1_ to
+the floating-point register _rd_. The bits are not modified in the
+transfer, and in particular, the payloads of non-canonical NaNs are
+preserved.
+
+[NOTE]
+====
+The FMV.W.X and FMV.X.W instructions were previously called FMV.S.X and
+FMV.X.S. The use of W is more consistent with their semantics as an
+instruction that moves 32 bits without interpreting them. This became
+clearer after defining NaN-boxing. To avoid disturbing existing code,
+both the W and S versions will be supported by tools.
+====
+
+include::images/wavedrom/spfloat-mv.adoc[]
+[[spfloat-mv]]
+.SP floating point move
+image::image_placeholder.png[]
+
+
+[TIP]
+====
+The base floating-point ISA was defined so as to allow implementations
+to employ an internal recoding of the floating-point format in registers
+to simplify handling of subnormal values and possibly to reduce
+functional unit latency. To this end, the F extension avoids
+representing integer values in the floating-point registers by defining
+conversion and comparison operations that read and write the integer
+register file directly. This also removes many of the common cases where
+explicit moves between integer and floating-point registers are
+required, reducing instruction count and critical paths for common
+mixed-format code sequences.
+====
+
+=== Single-Precision Floating-Point Compare Instructions
+
+Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the
+specified comparison between floating-point registers
+(latexmath:[$\mbox{\em rs1}
+= \mbox{\em rs2}$], latexmath:[$\mbox{\em rs1} < \mbox{\em rs2}$],
+latexmath:[$\mbox{\em rs1} \leq
+\mbox{\em rs2}$]) writing 1 to the integer register _rd_ if the
+condition holds, and 0 otherwise.
+
+FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as
+_signaling_ comparisons: that is, they set the invalid operation
+exception flag if either input is NaN. FEQ.S performs a _quiet_
+comparison: it only sets the invalid operation exception flag if either
+input is a signaling NaN. For all three instructions, the result is 0 if
+either operand is NaN.
+
+include::images/wavedrom/spfloat-comp.adoc[]
+[[spfloat-comp]]
+.SP floating point compare
+image::image_placeholder.png[]
+
+[NOTE]
+====
+The F extension provides a latexmath:[$\leq$] comparison, whereas the
+base ISAs provide a latexmath:[$\geq$] branch comparison. Because
+latexmath:[$\leq$] can be synthesized from latexmath:[$\geq$] and
+vice-versa, there is no performance implication to this inconsistency,
+but it is nevertheless an unfortunate incongruity in the ISA.
+====
+
+=== Single-Precision Floating-Point Classify Instruction
+
+The FCLASS.S instruction examines the value in floating-point register
+_rs1_ and writes to integer register _rd_ a 10-bit mask that indicates
+the class of the floating-point number. The format of the mask is
+described in <<fclass>>. The corresponding bit in _rd_ will
+be set if the property is true and clear otherwise. All other bits in
+_rd_ are cleared. Note that exactly one bit in _rd_ will be set.
+FCLASS.S does not set the floating-point exception flags.
+(((floating-point, classification)))
+
+include::images/wavedrom/spfloat-classify.adoc[]
+[[spfloat-classify]]
+.SP floating point classify
+image::image_placeholder.png[]
+
+[[fclass]]
+.Format of result of FCLASS instruction.
+[cols="^,<",options="header",]
+|===
+|_rd_ bit |Meaning
+|0 |_rs1_ is latexmath:[$-\infty$].
+|1 |_rs1_ is a negative normal number.
+|2 |_rs1_ is a negative subnormal number.
+|3 |_rs1_ is latexmath:[$-0$].
+|4 |_rs1_ is latexmath:[$+0$].
+|5 |_rs1_ is a positive subnormal number.
+|6 |_rs1_ is a positive normal number.
+|7 |_rs1_ is latexmath:[$+\infty$].
+|8 |_rs1_ is a signaling NaN.
+|9 |_rs1_ is a quiet NaN.
+|===
+