diff options
Diffstat (limited to 'src/zfh.tex')
-rw-r--r-- | src/zfh.tex | 422 |
1 files changed, 422 insertions, 0 deletions
diff --git a/src/zfh.tex b/src/zfh.tex new file mode 100644 index 0000000..3ccebf9 --- /dev/null +++ b/src/zfh.tex @@ -0,0 +1,422 @@ +\chapter{``Zfh'' and ``Zfhmin'' Standard Extensions for Half-Precision Floating-Point, + Version 0.1} + +{\bf Warning! This draft specification may change before being +accepted as standard by RISC-V International.} + +This chapter describes the Zfh standard extension for 16-bit half-precision +binary floating-point instructions compliant with the IEEE 754-2008 arithmetic +standard. +The Zfh extension depends on the single-precision floating-point extension, F. +The NaN-boxing scheme described in Section~\ref{nanboxing} is extended to +allow a half-precision value to be NaN-boxed inside a single-precision value +(which may be recursively NaN-boxed inside a double- or quad-precision value +when the D or Q extension is present). + +\begin{commentary} +This extension primarily provides instructions that consume half-precision +operands and produce half-precision results. +However, it is also common to compute on half-precision data using higher +intermediate precision. +Although this extension provides explicit conversion instructions that suffice +to implement that pattern, future extensions might further accelerate such +computation with additional instructions that implicitly widen their +operands---e.g., half$\times$half$+$single$\rightarrow$single---or implicitly +narrow their results---e.g., half$+$single$\rightarrow$half. +\end{commentary} + +\section{Half-Precision Load and Store Instructions} + +New 16-bit variants of LOAD-FP and STORE-FP instructions are added, +encoded with a new value for the funct3 width field. + +\vspace{-0.2in} +\begin{center} +\begin{tabular}{M@{}R@{}F@{}R@{}O} +\\ +\instbitrange{31}{20} & +\instbitrange{19}{15} & +\instbitrange{14}{12} & +\instbitrange{11}{7} & +\instbitrange{6}{0} \\ +\hline +\multicolumn{1}{|c|}{imm[11:0]} & +\multicolumn{1}{c|}{rs1} & +\multicolumn{1}{c|}{width} & +\multicolumn{1}{c|}{rd} & +\multicolumn{1}{c|}{opcode} \\ +\hline +12 & 5 & 3 & 5 & 7 \\ +offset[11:0] & base & H & dest & LOAD-FP \\ +\end{tabular} +\end{center} + +\vspace{-0.2in} +\begin{center} +\begin{tabular}{O@{}R@{}R@{}F@{}R@{}O} +\\ +\instbitrange{31}{25} & +\instbitrange{24}{20} & +\instbitrange{19}{15} & +\instbitrange{14}{12} & +\instbitrange{11}{7} & +\instbitrange{6}{0} \\ +\hline +\multicolumn{1}{|c|}{imm[11:5]} & +\multicolumn{1}{c|}{rs2} & +\multicolumn{1}{c|}{rs1} & +\multicolumn{1}{c|}{width} & +\multicolumn{1}{c|}{imm[4:0]} & +\multicolumn{1}{c|}{opcode} \\ +\hline +7 & 5 & 5 & 3 & 5 & 7 \\ +offset[11:5] & src & base & H & offset[4:0] & STORE-FP \\ +\end{tabular} +\end{center} + +FLH and FSH are only guaranteed to execute atomically if the effective address +is naturally aligned. + +FLH and FSH do not modify the bits being transferred; in particular, the +payloads of non-canonical NaNs are preserved. +FLH NaN-boxes the result written to {\em rd}, whereas FSH ignores all but +the lower 16 bits in {\em rs2}. + +\section{Half-Precision Computational Instructions} + +A new supported format is added to the format field of most +instructions, as shown in Table~\ref{tab:fpextfmth}. + +\begin{table}[htp] +\begin{center} +\begin{tabular}{|c|c|l|} +\hline +{\em fmt} field & +Mnemonic & +Meaning \\ +\hline +00 & S & 32-bit single-precision \\ +01 & D & 64-bit double-precision \\ +10 & H & 16-bit half-precision \\ +11 & Q & 128-bit quad-precision \\ +\hline +\end{tabular} +\end{center} +\caption{Format field encoding.} +\label{tab:fpextfmth} +\end{table} + +The half-precision floating-point computational instructions are +defined analogously to their single-precision counterparts, but operate on +half-precision operands and produce half-precision results. + +\vspace{-0.2in} +\begin{center} +\begin{tabular}{R@{}F@{}R@{}R@{}F@{}R@{}O} +\\ +\instbitrange{31}{27} & +\instbitrange{26}{25} & +\instbitrange{24}{20} & +\instbitrange{19}{15} & +\instbitrange{14}{12} & +\instbitrange{11}{7} & +\instbitrange{6}{0} \\ +\hline +\multicolumn{1}{|c|}{funct5} & +\multicolumn{1}{c|}{fmt} & +\multicolumn{1}{c|}{rs2} & +\multicolumn{1}{c|}{rs1} & +\multicolumn{1}{c|}{rm} & +\multicolumn{1}{c|}{rd} & +\multicolumn{1}{c|}{opcode} \\ +\hline +5 & 2 & 5 & 5 & 3 & 5 & 7 \\ +FADD/FSUB & H & src2 & src1 & RM & dest & OP-FP \\ +FMUL/FDIV & H & src2 & src1 & RM & dest & OP-FP \\ +FMIN-MAX & H & src2 & src1 & MIN/MAX & dest & OP-FP \\ +FSQRT & H & 0 & src & RM & dest & OP-FP \\ +\end{tabular} +\end{center} + +\vspace{-0.2in} +\begin{center} +\begin{tabular}{R@{}F@{}R@{}R@{}F@{}R@{}O} +\\ +\instbitrange{31}{27} & +\instbitrange{26}{25} & +\instbitrange{24}{20} & +\instbitrange{19}{15} & +\instbitrange{14}{12} & +\instbitrange{11}{7} & +\instbitrange{6}{0} \\ +\hline +\multicolumn{1}{|c|}{rs3} & +\multicolumn{1}{c|}{fmt} & +\multicolumn{1}{c|}{rs2} & +\multicolumn{1}{c|}{rs1} & +\multicolumn{1}{c|}{rm} & +\multicolumn{1}{c|}{rd} & +\multicolumn{1}{c|}{opcode} \\ +\hline +5 & 2 & 5 & 5 & 3 & 5 & 7 \\ +src3 & H & src2 & src1 & RM & dest & F[N]MADD/F[N]MSUB \\ +\end{tabular} +\end{center} + +\section{Half-Precision Convert and Move Instructions} + +New floating-point-to-integer and integer-to-floating-point conversion +instructions are added. These instructions are defined analogously to the +single-precision-to-integer and integer-to-single-precision conversion +instructions. FCVT.W.H or FCVT.L.H converts a half-precision floating-point +number to a signed 32-bit or 64-bit integer, respectively. FCVT.H.W or +FCVT.H.L converts a 32-bit or 64-bit signed integer, respectively, into a +half-precision floating-point number. FCVT.WU.H, FCVT.LU.H, FCVT.H.WU, and +FCVT.H.LU variants convert to or from unsigned integer values. FCVT.L[U].H and +FCVT.H.L[U] are RV64-only instructions. + +\vspace{-0.2in} +\begin{center} +\begin{tabular}{R@{}F@{}R@{}R@{}F@{}R@{}O} +\\ +\instbitrange{31}{27} & +\instbitrange{26}{25} & +\instbitrange{24}{20} & +\instbitrange{19}{15} & +\instbitrange{14}{12} & +\instbitrange{11}{7} & +\instbitrange{6}{0} \\ +\hline +\multicolumn{1}{|c|}{funct5} & +\multicolumn{1}{c|}{fmt} & +\multicolumn{1}{c|}{rs2} & +\multicolumn{1}{c|}{rs1} & +\multicolumn{1}{c|}{rm} & +\multicolumn{1}{c|}{rd} & +\multicolumn{1}{c|}{opcode} \\ +\hline +5 & 2 & 5 & 5 & 3 & 5 & 7 \\ +FCVT.{\em int}.H & H & W[U]/L[U] & src & RM & dest & OP-FP \\ +FCVT.H.{\em int} & H & W[U]/L[U] & src & RM & dest & OP-FP \\ +\end{tabular} +\end{center} + +New floating-point-to-floating-point conversion instructions are added. These +instructions are defined analogously to the double-precision +floating-point-to-floating-point conversion instructions. +FCVT.S.H or FCVT.H.S converts a half-precision floating-point number to +a single-precision floating-point number, or vice-versa, respectively. +If the D extension is present, FCVT.D.H or FCVT.H.D converts a half-precision +floating-point number to a double-precision floating-point number, or +vice-versa, respectively. +If the Q extension is present, FCVT.Q.H or FCVT.H.Q converts a half-precision +floating-point number to a quad-precision floating-point number, or +vice-versa, respectively. + +\vspace{-0.2in} +\begin{center} +\begin{tabular}{R@{}F@{}R@{}R@{}F@{}R@{}O} +\\ +\instbitrange{31}{27} & +\instbitrange{26}{25} & +\instbitrange{24}{20} & +\instbitrange{19}{15} & +\instbitrange{14}{12} & +\instbitrange{11}{7} & +\instbitrange{6}{0} \\ +\hline +\multicolumn{1}{|c|}{funct5} & +\multicolumn{1}{c|}{fmt} & +\multicolumn{1}{c|}{rs2} & +\multicolumn{1}{c|}{rs1} & +\multicolumn{1}{c|}{rm} & +\multicolumn{1}{c|}{rd} & +\multicolumn{1}{c|}{opcode} \\ +\hline +5 & 2 & 5 & 5 & 3 & 5 & 7 \\ +FCVT.S.H & S & H & src & RM & dest & OP-FP \\ +FCVT.H.S & H & S & src & RM & dest & OP-FP \\ +FCVT.D.H & D & H & src & RM & dest & OP-FP \\ +FCVT.H.D & H & D & src & RM & dest & OP-FP \\ +FCVT.Q.H & Q & H & src & RM & dest & OP-FP \\ +FCVT.H.Q & H & Q & src & RM & dest & OP-FP \\ +\end{tabular} +\end{center} + +Floating-point to floating-point sign-injection instructions, FSGNJ.H, +FSGNJN.H, and FSGNJX.H are defined analogously to the single-precision +sign-injection instruction. + +\vspace{-0.2in} +\begin{center} +\begin{tabular}{R@{}F@{}R@{}R@{}F@{}R@{}O} +\\ +\instbitrange{31}{27} & +\instbitrange{26}{25} & +\instbitrange{24}{20} & +\instbitrange{19}{15} & +\instbitrange{14}{12} & +\instbitrange{11}{7} & +\instbitrange{6}{0} \\ +\hline +\multicolumn{1}{|c|}{funct5} & +\multicolumn{1}{c|}{fmt} & +\multicolumn{1}{c|}{rs2} & +\multicolumn{1}{c|}{rs1} & +\multicolumn{1}{c|}{rm} & +\multicolumn{1}{c|}{rd} & +\multicolumn{1}{c|}{opcode} \\ +\hline +5 & 2 & 5 & 5 & 3 & 5 & 7 \\ +FSGNJ & H & src2 & src1 & J[N]/JX & dest & OP-FP \\ +\end{tabular} +\end{center} + + +Instructions are provided to move bit patterns between the floating-point and +integer registers. +FMV.X.H moves the half-precision value in floating-point register {\em rs1} to +a representation in IEEE 754-2008 standard encoding in integer register {\em +rd}, filling the upper XLEN-16 bits with copies of the floating-point number's +sign bit. + +FMV.H.X moves the half-precision value encoded in IEEE 754-2008 standard +encoding from the lower 16 bits of integer register {\em rs1} to the +floating-point register {\em rd}, NaN-boxing the result. + +FMV.X.H and FMV.H.X do not modify the bits being transferred; in particular, +the payloads of non-canonical NaNs are preserved. + +\vspace{-0.2in} +\begin{center} +\begin{tabular}{R@{}F@{}R@{}R@{}F@{}R@{}O} +\\ +\instbitrange{31}{27} & +\instbitrange{26}{25} & +\instbitrange{24}{20} & +\instbitrange{19}{15} & +\instbitrange{14}{12} & +\instbitrange{11}{7} & +\instbitrange{6}{0} \\ +\hline +\multicolumn{1}{|c|}{funct5} & +\multicolumn{1}{c|}{fmt} & +\multicolumn{1}{c|}{rs2} & +\multicolumn{1}{c|}{rs1} & +\multicolumn{1}{c|}{rm} & +\multicolumn{1}{c|}{rd} & +\multicolumn{1}{c|}{opcode} \\ +\hline +5 & 2 & 5 & 5 & 3 & 5 & 7 \\ +FMV.X.H & H & 0 & src & 000 & dest & OP-FP \\ +FMV.H.X & H & 0 & src & 000 & dest & OP-FP \\ +\end{tabular} +\end{center} + +\section{Half-Precision Floating-Point Compare Instructions} + +The half-precision floating-point compare instructions are +defined analogously to their single-precision counterparts, but operate on +half-precision operands. + +\vspace{-0.2in} +\begin{center} +\begin{tabular}{S@{}F@{}R@{}R@{}F@{}R@{}O} +\\ +\instbitrange{31}{27} & +\instbitrange{26}{25} & +\instbitrange{24}{20} & +\instbitrange{19}{15} & +\instbitrange{14}{12} & +\instbitrange{11}{7} & +\instbitrange{6}{0} \\ +\hline +\multicolumn{1}{|c|}{funct5} & +\multicolumn{1}{c|}{fmt} & +\multicolumn{1}{c|}{rs2} & +\multicolumn{1}{c|}{rs1} & +\multicolumn{1}{c|}{rm} & +\multicolumn{1}{c|}{rd} & +\multicolumn{1}{c|}{opcode} \\ +\hline +5 & 2 & 5 & 5 & 3 & 5 & 7 \\ +FCMP & H & src2 & src1 & EQ/LT/LE & dest & OP-FP \\ +\end{tabular} +\end{center} + +\section{Half-Precision Floating-Point Classify Instruction} + +The half-precision floating-point classify instruction, FCLASS.H, is +defined analogously to its single-precision counterpart, but operates on +half-precision operands. + +\vspace{-0.2in} +\begin{center} +\begin{tabular}{S@{}F@{}R@{}R@{}F@{}R@{}O} +\\ +\instbitrange{31}{27} & +\instbitrange{26}{25} & +\instbitrange{24}{20} & +\instbitrange{19}{15} & +\instbitrange{14}{12} & +\instbitrange{11}{7} & +\instbitrange{6}{0} \\ +\hline +\multicolumn{1}{|c|}{funct5} & +\multicolumn{1}{c|}{fmt} & +\multicolumn{1}{c|}{rs2} & +\multicolumn{1}{c|}{rs1} & +\multicolumn{1}{c|}{rm} & +\multicolumn{1}{c|}{rd} & +\multicolumn{1}{c|}{opcode} \\ +\hline +5 & 2 & 5 & 5 & 3 & 5 & 7 \\ +FCLASS & H & 0 & src & 001 & dest & OP-FP \\ +\end{tabular} +\end{center} + +\section{``Zfhmin'' Standard Extension for Minimal Half-Precision Floating-Point Support} + +{\bf Warning! This draft specification may change before being +accepted as standard by RISC-V International.} + +This section describes the Zfhmin standard extension, which provides minimal +support for 16-bit half-precision binary floating-point instructions. +The Zfhmin extension is a subset of the Zfh extension, consisting only +of data transfer and conversion instructions. +Like Zfh, the Zfhmin extension depends on the single-precision floating-point +extension, F. +The expectation is that Zfhmin software primarily uses the half-precision +format for storage, performing most computation in higher precision. + +The Zfhmin extension includes the following instructions from the Zfh +extension: FLH, FSH, FMV.X.H, FMV.H.X, FCVT.S.H, and FCVT.H.S. +If the D extension is present, the FCVT.D.H and FCVT.H.D instructions are +also included. +If the Q extension is present, the FCVT.Q.H and FCVT.H.Q instructions are +additionally included. + +\begin{commentary} +Zfhmin does not include the FSGNJ.H instruction, because it suffices to +instead use the FSGNJ.S instruction to move half-precision values between +floating-point registers. +\end{commentary} + +\begin{commentary} +Half-precision addition, subtraction, multiplication, division, and +square-root operations can be faithfully emulated by converting the +half-precision operands to single-precision, performing the operation +using single-precision arithmetic, then converting back to +half-precision~\cite{roux:hal-01091186}. +Performing half-precision fused multiply-addition using this method incurs +a 1-ulp error on some inputs for the RNE and RMM rounding modes. + +Conversion from 8- or 16-bit integers to half-precision can be emulated by +first converting to single-precision, then converting to half-precision. +Conversion from 32-bit integer can be emulated by first converting to +double-precision. +If the D extension is not present and a 1-ulp error under RNE or RMM is +tolerable, 32-bit integers can be first converted to single-precision instead. +The same remark applies to conversions from 64-bit integers without the Q +extension. +\end{commentary} |