aboutsummaryrefslogtreecommitdiff
path: root/src/zfh.tex
diff options
context:
space:
mode:
Diffstat (limited to 'src/zfh.tex')
-rw-r--r--src/zfh.tex422
1 files changed, 422 insertions, 0 deletions
diff --git a/src/zfh.tex b/src/zfh.tex
new file mode 100644
index 0000000..3ccebf9
--- /dev/null
+++ b/src/zfh.tex
@@ -0,0 +1,422 @@
+\chapter{``Zfh'' and ``Zfhmin'' Standard Extensions for Half-Precision Floating-Point,
+ Version 0.1}
+
+{\bf Warning! This draft specification may change before being
+accepted as standard by RISC-V International.}
+
+This chapter describes the Zfh standard extension for 16-bit half-precision
+binary floating-point instructions compliant with the IEEE 754-2008 arithmetic
+standard.
+The Zfh extension depends on the single-precision floating-point extension, F.
+The NaN-boxing scheme described in Section~\ref{nanboxing} is extended to
+allow a half-precision value to be NaN-boxed inside a single-precision value
+(which may be recursively NaN-boxed inside a double- or quad-precision value
+when the D or Q extension is present).
+
+\begin{commentary}
+This extension primarily provides instructions that consume half-precision
+operands and produce half-precision results.
+However, it is also common to compute on half-precision data using higher
+intermediate precision.
+Although this extension provides explicit conversion instructions that suffice
+to implement that pattern, future extensions might further accelerate such
+computation with additional instructions that implicitly widen their
+operands---e.g., half$\times$half$+$single$\rightarrow$single---or implicitly
+narrow their results---e.g., half$+$single$\rightarrow$half.
+\end{commentary}
+
+\section{Half-Precision Load and Store Instructions}
+
+New 16-bit variants of LOAD-FP and STORE-FP instructions are added,
+encoded with a new value for the funct3 width field.
+
+\vspace{-0.2in}
+\begin{center}
+\begin{tabular}{M@{}R@{}F@{}R@{}O}
+\\
+\instbitrange{31}{20} &
+\instbitrange{19}{15} &
+\instbitrange{14}{12} &
+\instbitrange{11}{7} &
+\instbitrange{6}{0} \\
+\hline
+\multicolumn{1}{|c|}{imm[11:0]} &
+\multicolumn{1}{c|}{rs1} &
+\multicolumn{1}{c|}{width} &
+\multicolumn{1}{c|}{rd} &
+\multicolumn{1}{c|}{opcode} \\
+\hline
+12 & 5 & 3 & 5 & 7 \\
+offset[11:0] & base & H & dest & LOAD-FP \\
+\end{tabular}
+\end{center}
+
+\vspace{-0.2in}
+\begin{center}
+\begin{tabular}{O@{}R@{}R@{}F@{}R@{}O}
+\\
+\instbitrange{31}{25} &
+\instbitrange{24}{20} &
+\instbitrange{19}{15} &
+\instbitrange{14}{12} &
+\instbitrange{11}{7} &
+\instbitrange{6}{0} \\
+\hline
+\multicolumn{1}{|c|}{imm[11:5]} &
+\multicolumn{1}{c|}{rs2} &
+\multicolumn{1}{c|}{rs1} &
+\multicolumn{1}{c|}{width} &
+\multicolumn{1}{c|}{imm[4:0]} &
+\multicolumn{1}{c|}{opcode} \\
+\hline
+7 & 5 & 5 & 3 & 5 & 7 \\
+offset[11:5] & src & base & H & offset[4:0] & STORE-FP \\
+\end{tabular}
+\end{center}
+
+FLH and FSH are only guaranteed to execute atomically if the effective address
+is naturally aligned.
+
+FLH and FSH do not modify the bits being transferred; in particular, the
+payloads of non-canonical NaNs are preserved.
+FLH NaN-boxes the result written to {\em rd}, whereas FSH ignores all but
+the lower 16 bits in {\em rs2}.
+
+\section{Half-Precision Computational Instructions}
+
+A new supported format is added to the format field of most
+instructions, as shown in Table~\ref{tab:fpextfmth}.
+
+\begin{table}[htp]
+\begin{center}
+\begin{tabular}{|c|c|l|}
+\hline
+{\em fmt} field &
+Mnemonic &
+Meaning \\
+\hline
+00 & S & 32-bit single-precision \\
+01 & D & 64-bit double-precision \\
+10 & H & 16-bit half-precision \\
+11 & Q & 128-bit quad-precision \\
+\hline
+\end{tabular}
+\end{center}
+\caption{Format field encoding.}
+\label{tab:fpextfmth}
+\end{table}
+
+The half-precision floating-point computational instructions are
+defined analogously to their single-precision counterparts, but operate on
+half-precision operands and produce half-precision results.
+
+\vspace{-0.2in}
+\begin{center}
+\begin{tabular}{R@{}F@{}R@{}R@{}F@{}R@{}O}
+\\
+\instbitrange{31}{27} &
+\instbitrange{26}{25} &
+\instbitrange{24}{20} &
+\instbitrange{19}{15} &
+\instbitrange{14}{12} &
+\instbitrange{11}{7} &
+\instbitrange{6}{0} \\
+\hline
+\multicolumn{1}{|c|}{funct5} &
+\multicolumn{1}{c|}{fmt} &
+\multicolumn{1}{c|}{rs2} &
+\multicolumn{1}{c|}{rs1} &
+\multicolumn{1}{c|}{rm} &
+\multicolumn{1}{c|}{rd} &
+\multicolumn{1}{c|}{opcode} \\
+\hline
+5 & 2 & 5 & 5 & 3 & 5 & 7 \\
+FADD/FSUB & H & src2 & src1 & RM & dest & OP-FP \\
+FMUL/FDIV & H & src2 & src1 & RM & dest & OP-FP \\
+FMIN-MAX & H & src2 & src1 & MIN/MAX & dest & OP-FP \\
+FSQRT & H & 0 & src & RM & dest & OP-FP \\
+\end{tabular}
+\end{center}
+
+\vspace{-0.2in}
+\begin{center}
+\begin{tabular}{R@{}F@{}R@{}R@{}F@{}R@{}O}
+\\
+\instbitrange{31}{27} &
+\instbitrange{26}{25} &
+\instbitrange{24}{20} &
+\instbitrange{19}{15} &
+\instbitrange{14}{12} &
+\instbitrange{11}{7} &
+\instbitrange{6}{0} \\
+\hline
+\multicolumn{1}{|c|}{rs3} &
+\multicolumn{1}{c|}{fmt} &
+\multicolumn{1}{c|}{rs2} &
+\multicolumn{1}{c|}{rs1} &
+\multicolumn{1}{c|}{rm} &
+\multicolumn{1}{c|}{rd} &
+\multicolumn{1}{c|}{opcode} \\
+\hline
+5 & 2 & 5 & 5 & 3 & 5 & 7 \\
+src3 & H & src2 & src1 & RM & dest & F[N]MADD/F[N]MSUB \\
+\end{tabular}
+\end{center}
+
+\section{Half-Precision Convert and Move Instructions}
+
+New floating-point-to-integer and integer-to-floating-point conversion
+instructions are added. These instructions are defined analogously to the
+single-precision-to-integer and integer-to-single-precision conversion
+instructions. FCVT.W.H or FCVT.L.H converts a half-precision floating-point
+number to a signed 32-bit or 64-bit integer, respectively. FCVT.H.W or
+FCVT.H.L converts a 32-bit or 64-bit signed integer, respectively, into a
+half-precision floating-point number. FCVT.WU.H, FCVT.LU.H, FCVT.H.WU, and
+FCVT.H.LU variants convert to or from unsigned integer values. FCVT.L[U].H and
+FCVT.H.L[U] are RV64-only instructions.
+
+\vspace{-0.2in}
+\begin{center}
+\begin{tabular}{R@{}F@{}R@{}R@{}F@{}R@{}O}
+\\
+\instbitrange{31}{27} &
+\instbitrange{26}{25} &
+\instbitrange{24}{20} &
+\instbitrange{19}{15} &
+\instbitrange{14}{12} &
+\instbitrange{11}{7} &
+\instbitrange{6}{0} \\
+\hline
+\multicolumn{1}{|c|}{funct5} &
+\multicolumn{1}{c|}{fmt} &
+\multicolumn{1}{c|}{rs2} &
+\multicolumn{1}{c|}{rs1} &
+\multicolumn{1}{c|}{rm} &
+\multicolumn{1}{c|}{rd} &
+\multicolumn{1}{c|}{opcode} \\
+\hline
+5 & 2 & 5 & 5 & 3 & 5 & 7 \\
+FCVT.{\em int}.H & H & W[U]/L[U] & src & RM & dest & OP-FP \\
+FCVT.H.{\em int} & H & W[U]/L[U] & src & RM & dest & OP-FP \\
+\end{tabular}
+\end{center}
+
+New floating-point-to-floating-point conversion instructions are added. These
+instructions are defined analogously to the double-precision
+floating-point-to-floating-point conversion instructions.
+FCVT.S.H or FCVT.H.S converts a half-precision floating-point number to
+a single-precision floating-point number, or vice-versa, respectively.
+If the D extension is present, FCVT.D.H or FCVT.H.D converts a half-precision
+floating-point number to a double-precision floating-point number, or
+vice-versa, respectively.
+If the Q extension is present, FCVT.Q.H or FCVT.H.Q converts a half-precision
+floating-point number to a quad-precision floating-point number, or
+vice-versa, respectively.
+
+\vspace{-0.2in}
+\begin{center}
+\begin{tabular}{R@{}F@{}R@{}R@{}F@{}R@{}O}
+\\
+\instbitrange{31}{27} &
+\instbitrange{26}{25} &
+\instbitrange{24}{20} &
+\instbitrange{19}{15} &
+\instbitrange{14}{12} &
+\instbitrange{11}{7} &
+\instbitrange{6}{0} \\
+\hline
+\multicolumn{1}{|c|}{funct5} &
+\multicolumn{1}{c|}{fmt} &
+\multicolumn{1}{c|}{rs2} &
+\multicolumn{1}{c|}{rs1} &
+\multicolumn{1}{c|}{rm} &
+\multicolumn{1}{c|}{rd} &
+\multicolumn{1}{c|}{opcode} \\
+\hline
+5 & 2 & 5 & 5 & 3 & 5 & 7 \\
+FCVT.S.H & S & H & src & RM & dest & OP-FP \\
+FCVT.H.S & H & S & src & RM & dest & OP-FP \\
+FCVT.D.H & D & H & src & RM & dest & OP-FP \\
+FCVT.H.D & H & D & src & RM & dest & OP-FP \\
+FCVT.Q.H & Q & H & src & RM & dest & OP-FP \\
+FCVT.H.Q & H & Q & src & RM & dest & OP-FP \\
+\end{tabular}
+\end{center}
+
+Floating-point to floating-point sign-injection instructions, FSGNJ.H,
+FSGNJN.H, and FSGNJX.H are defined analogously to the single-precision
+sign-injection instruction.
+
+\vspace{-0.2in}
+\begin{center}
+\begin{tabular}{R@{}F@{}R@{}R@{}F@{}R@{}O}
+\\
+\instbitrange{31}{27} &
+\instbitrange{26}{25} &
+\instbitrange{24}{20} &
+\instbitrange{19}{15} &
+\instbitrange{14}{12} &
+\instbitrange{11}{7} &
+\instbitrange{6}{0} \\
+\hline
+\multicolumn{1}{|c|}{funct5} &
+\multicolumn{1}{c|}{fmt} &
+\multicolumn{1}{c|}{rs2} &
+\multicolumn{1}{c|}{rs1} &
+\multicolumn{1}{c|}{rm} &
+\multicolumn{1}{c|}{rd} &
+\multicolumn{1}{c|}{opcode} \\
+\hline
+5 & 2 & 5 & 5 & 3 & 5 & 7 \\
+FSGNJ & H & src2 & src1 & J[N]/JX & dest & OP-FP \\
+\end{tabular}
+\end{center}
+
+
+Instructions are provided to move bit patterns between the floating-point and
+integer registers.
+FMV.X.H moves the half-precision value in floating-point register {\em rs1} to
+a representation in IEEE 754-2008 standard encoding in integer register {\em
+rd}, filling the upper XLEN-16 bits with copies of the floating-point number's
+sign bit.
+
+FMV.H.X moves the half-precision value encoded in IEEE 754-2008 standard
+encoding from the lower 16 bits of integer register {\em rs1} to the
+floating-point register {\em rd}, NaN-boxing the result.
+
+FMV.X.H and FMV.H.X do not modify the bits being transferred; in particular,
+the payloads of non-canonical NaNs are preserved.
+
+\vspace{-0.2in}
+\begin{center}
+\begin{tabular}{R@{}F@{}R@{}R@{}F@{}R@{}O}
+\\
+\instbitrange{31}{27} &
+\instbitrange{26}{25} &
+\instbitrange{24}{20} &
+\instbitrange{19}{15} &
+\instbitrange{14}{12} &
+\instbitrange{11}{7} &
+\instbitrange{6}{0} \\
+\hline
+\multicolumn{1}{|c|}{funct5} &
+\multicolumn{1}{c|}{fmt} &
+\multicolumn{1}{c|}{rs2} &
+\multicolumn{1}{c|}{rs1} &
+\multicolumn{1}{c|}{rm} &
+\multicolumn{1}{c|}{rd} &
+\multicolumn{1}{c|}{opcode} \\
+\hline
+5 & 2 & 5 & 5 & 3 & 5 & 7 \\
+FMV.X.H & H & 0 & src & 000 & dest & OP-FP \\
+FMV.H.X & H & 0 & src & 000 & dest & OP-FP \\
+\end{tabular}
+\end{center}
+
+\section{Half-Precision Floating-Point Compare Instructions}
+
+The half-precision floating-point compare instructions are
+defined analogously to their single-precision counterparts, but operate on
+half-precision operands.
+
+\vspace{-0.2in}
+\begin{center}
+\begin{tabular}{S@{}F@{}R@{}R@{}F@{}R@{}O}
+\\
+\instbitrange{31}{27} &
+\instbitrange{26}{25} &
+\instbitrange{24}{20} &
+\instbitrange{19}{15} &
+\instbitrange{14}{12} &
+\instbitrange{11}{7} &
+\instbitrange{6}{0} \\
+\hline
+\multicolumn{1}{|c|}{funct5} &
+\multicolumn{1}{c|}{fmt} &
+\multicolumn{1}{c|}{rs2} &
+\multicolumn{1}{c|}{rs1} &
+\multicolumn{1}{c|}{rm} &
+\multicolumn{1}{c|}{rd} &
+\multicolumn{1}{c|}{opcode} \\
+\hline
+5 & 2 & 5 & 5 & 3 & 5 & 7 \\
+FCMP & H & src2 & src1 & EQ/LT/LE & dest & OP-FP \\
+\end{tabular}
+\end{center}
+
+\section{Half-Precision Floating-Point Classify Instruction}
+
+The half-precision floating-point classify instruction, FCLASS.H, is
+defined analogously to its single-precision counterpart, but operates on
+half-precision operands.
+
+\vspace{-0.2in}
+\begin{center}
+\begin{tabular}{S@{}F@{}R@{}R@{}F@{}R@{}O}
+\\
+\instbitrange{31}{27} &
+\instbitrange{26}{25} &
+\instbitrange{24}{20} &
+\instbitrange{19}{15} &
+\instbitrange{14}{12} &
+\instbitrange{11}{7} &
+\instbitrange{6}{0} \\
+\hline
+\multicolumn{1}{|c|}{funct5} &
+\multicolumn{1}{c|}{fmt} &
+\multicolumn{1}{c|}{rs2} &
+\multicolumn{1}{c|}{rs1} &
+\multicolumn{1}{c|}{rm} &
+\multicolumn{1}{c|}{rd} &
+\multicolumn{1}{c|}{opcode} \\
+\hline
+5 & 2 & 5 & 5 & 3 & 5 & 7 \\
+FCLASS & H & 0 & src & 001 & dest & OP-FP \\
+\end{tabular}
+\end{center}
+
+\section{``Zfhmin'' Standard Extension for Minimal Half-Precision Floating-Point Support}
+
+{\bf Warning! This draft specification may change before being
+accepted as standard by RISC-V International.}
+
+This section describes the Zfhmin standard extension, which provides minimal
+support for 16-bit half-precision binary floating-point instructions.
+The Zfhmin extension is a subset of the Zfh extension, consisting only
+of data transfer and conversion instructions.
+Like Zfh, the Zfhmin extension depends on the single-precision floating-point
+extension, F.
+The expectation is that Zfhmin software primarily uses the half-precision
+format for storage, performing most computation in higher precision.
+
+The Zfhmin extension includes the following instructions from the Zfh
+extension: FLH, FSH, FMV.X.H, FMV.H.X, FCVT.S.H, and FCVT.H.S.
+If the D extension is present, the FCVT.D.H and FCVT.H.D instructions are
+also included.
+If the Q extension is present, the FCVT.Q.H and FCVT.H.Q instructions are
+additionally included.
+
+\begin{commentary}
+Zfhmin does not include the FSGNJ.H instruction, because it suffices to
+instead use the FSGNJ.S instruction to move half-precision values between
+floating-point registers.
+\end{commentary}
+
+\begin{commentary}
+Half-precision addition, subtraction, multiplication, division, and
+square-root operations can be faithfully emulated by converting the
+half-precision operands to single-precision, performing the operation
+using single-precision arithmetic, then converting back to
+half-precision~\cite{roux:hal-01091186}.
+Performing half-precision fused multiply-addition using this method incurs
+a 1-ulp error on some inputs for the RNE and RMM rounding modes.
+
+Conversion from 8- or 16-bit integers to half-precision can be emulated by
+first converting to single-precision, then converting to half-precision.
+Conversion from 32-bit integer can be emulated by first converting to
+double-precision.
+If the D extension is not present and a 1-ulp error under RNE or RMM is
+tolerable, 32-bit integers can be first converted to single-precision instead.
+The same remark applies to conversions from 64-bit integers without the Q
+extension.
+\end{commentary}