src/zfa.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270

\chapter{``Zfa'' Standard Extension for Additional Floating-Point
Instructions, Version 0.1}
\label{chap:zfa}

{\bf Warning! This draft specification may change before being
accepted as standard by RISC-V International.}

This chapter describes the Zfa standard extension, which adds instructions for
immediate loads,
IEEE 754-2019 minimum and maximum operations,
round-to-integer operations,
and quiet floating-point comparisons.
For RV32D, the Zfa extension also adds instructions to transfer
double-precision floating-point values to and from integer registers, and for
RV64Q, it adds analogous instructions for quad-precision floating-point
values.
The Zfa extension depends on the F extension.


\section{Load-Immediate Instructions}

The FLI.S instruction loads one of 32 single-precision
floating-point constants, encoded in the {\em rs1} field, into
floating-point register {\em rd}.
The correspondence of {\em rs1} field values and single-precision
floating-point values is shown in Table~\ref{tab:flis}.
FLI.S is encoded like FMV.W.X, but with {\em rs2}=1.

\begin{table}[h!]
\center
\begin{tabular}{|r|r|c|c|c|}
\hline
{\em rs1}   & Value                            & Sign    & Exponent       & Significand     \\
\hline
 0          & $-1.0$                           & {\tt 1} & {\tt 01111111} & {\tt 000...000} \\
 1          & {\em Minimum positive normal}    & {\tt 0} & {\tt 00000001} & {\tt 000...000} \\
 2          & $1.0 \times 2^{-16}$             & {\tt 0} & {\tt 01101111} & {\tt 000...000} \\
 3          & $1.0 \times 2^{-15}$             & {\tt 0} & {\tt 01110000} & {\tt 000...000} \\
 4          & $1.0 \times 2^{-8}$              & {\tt 0} & {\tt 01110111} & {\tt 000...000} \\
 5          & $1.0 \times 2^{-7}$              & {\tt 0} & {\tt 01111000} & {\tt 000...000} \\
 6          & 0.0625 ($2^{-4}$)                & {\tt 0} & {\tt 01111011} & {\tt 000...000} \\
 7          & 0.125 ($2^{-3}$)                 & {\tt 0} & {\tt 01111100} & {\tt 000...000} \\
 8          & 0.25                             & {\tt 0} & {\tt 01111101} & {\tt 000...000} \\
 9          & 0.3125                           & {\tt 0} & {\tt 01111101} & {\tt 010...000} \\
10          & 0.375                            & {\tt 0} & {\tt 01111101} & {\tt 100...000} \\
11          & 0.4375                           & {\tt 0} & {\tt 01111101} & {\tt 110...000} \\
12          & 0.5                              & {\tt 0} & {\tt 01111110} & {\tt 000...000} \\
13          & 0.625                            & {\tt 0} & {\tt 01111110} & {\tt 010...000} \\
14          & 0.75                             & {\tt 0} & {\tt 01111110} & {\tt 100...000} \\
15          & 0.875                            & {\tt 0} & {\tt 01111110} & {\tt 110...000} \\
16          & 1.0                              & {\tt 0} & {\tt 01111111} & {\tt 000...000} \\
17          & 1.25                             & {\tt 0} & {\tt 01111111} & {\tt 010...000} \\
18          & 1.5                              & {\tt 0} & {\tt 01111111} & {\tt 100...000} \\
19          & 1.75                             & {\tt 0} & {\tt 01111111} & {\tt 110...000} \\
20          & 2.0                              & {\tt 0} & {\tt 10000000} & {\tt 000...000} \\
21          & 2.5                              & {\tt 0} & {\tt 10000000} & {\tt 010...000} \\
22          & 3                                & {\tt 0} & {\tt 10000000} & {\tt 100...000} \\
23          & 4                                & {\tt 0} & {\tt 10000001} & {\tt 000...000} \\
24          & 8                                & {\tt 0} & {\tt 10000010} & {\tt 000...000} \\
25          & 16                               & {\tt 0} & {\tt 10000011} & {\tt 000...000} \\
26          & 128 ($2^7$)                      & {\tt 0} & {\tt 10000110} & {\tt 000...000} \\
27          & 256 ($2^8$)                      & {\tt 0} & {\tt 10000111} & {\tt 000...000} \\
28          & $2^{15}$                         & {\tt 0} & {\tt 10001110} & {\tt 000...000} \\
29          & $2^{16}$                         & {\tt 0} & {\tt 10001111} & {\tt 000...000} \\
30          & $+\infty$                        & {\tt 0} & {\tt 11111111} & {\tt 000...000} \\
31          & {\em Canonical NaN}              & {\tt 0} & {\tt 11111111} & {\tt 100...000} \\
\hline
\end{tabular}
\caption{Immediate values loaded by the FLI.S instruction.}
\label{tab:flis}
\end{table}

\begin{commentary}
The preferred assembly syntax for entries 1, 30, and 31 is {\tt min},
{\tt inf}, and {\tt nan}, respectively.
For entries 0 through 29 (including entry 1), the assembler will accept
decimal constants in C-like syntax.
\end{commentary}

\begin{commentary}
The set of 32 constants was chosen by examining floating-point libraries,
including the C standard math library, and to optimize fixed-point to
floating-point conversion.

Entries 8--22 follow a regular encoding pattern.
No entry sets mantissa bits other than the two most significant ones.
\end{commentary}

If the D extension is implemented, FLI.D performs the analogous operation,
but loads a double-precision value into floating-point register {\em rd}.
Note that entry 1 (corresponding to the minimum positive normal value) has a
numerically different value for double-precision than for single-precision.
FLI.D is encoded like FLI.S, but with {\em fmt}=D.

If the Q extension is implemented, FLI.Q performs the analogous operation,
but loads a quad-precision value into floating-point register {\em rd}.
Note that entry 1 (corresponding to the minimum positive normal value) has a
numerically different value for quad-precision.
FLI.Q is encoded like FLI.S, but with {\em fmt}=Q.

If the Zfh or Zvfh extension is implemented, FLI.H performs the analogous
operation, but loads a half-precision floating-point value into register
{\em rd}.
Note that entry 1 (corresponding to the minimum positive normal value) has a
numerically different value for half-precision.
Furthermore,
since $2^{16}$ is not representable in half-precision floating-point, entry 29
in the table instead loads positive infinity---i.e., it is redundant
with entry 30.
FLI.H is encoded like FLI.S, but with {\em fmt}=H.

\begin{commentary}
Additionally, since $2^{-16}$ is a subnormal in half-precision, entry 1 is numerically
greater than entry 2 for FLI.H.
\end{commentary}

The FLI.{\em fmt} instructions never set any floating-point exception flags.


\section{Minimum and Maximum Instructions}

The FMINM.S and FMAXM.S instructions are defined like the FMIN.S and FMAX.S
instructions, except that if either input is NaN, the result is the
canonical NaN.

If the D extension is implemented, FMINM.D and FMAXM.D instructions are
analogously defined to operate on double-precision numbers.

If the Zfh extension is implemented, FMINM.H and FMAXM.H instructions are
analogously defined to operate on half-precision numbers.

If the Q extension is implemented, FMINM.Q and FMAXM.Q instructions are
analogously defined to operate on quad-precision numbers.

These instructions are encoded like their FMIN and FMAX counterparts, but
with instruction bit 13 set to 1.

\begin{commentary}
These instructions implement the IEEE 754-2019 minimum and maximum operations.
\end{commentary}


\section{Round-to-Integer Instructions}

The FROUND.S instruction rounds the single-precision floating-point number in
floating-point register {\em rs1} to an integer, according to the rounding
mode specified in the instruction's {\em rm} field.
It then writes that integer, represented as a single-precision floating-point
number, to floating-point register {\em rd}.
Zero and infinite inputs are copied to {\em rd} unmodified.
Signaling NaN inputs cause the invalid operation exception flag to be set; no
other exception flags are set.
FROUND.S is encoded like FCVT.S.D, but with {\em rs2}=4.

The FROUNDNX.S instruction is defined similarly, but it also sets the inexact
exception flag if the input differs from the rounded result and is not NaN.
FROUNDNX.S is encoded like FCVT.S.D, but with {\em rs2}=5.

If the D extension is implemented, FROUND.D and FROUNDNX.D instructions are
analogously defined to operate on double-precision numbers.
They are encoded like FCVT.D.S, but with {\em rs2}=4 and 5, respectively,

If the Zfh extension is implemented, FROUND.H and FROUNDNX.H instructions are
analogously defined to operate on half-precision numbers.
They are encoded like FCVT.H.S, but with {\em rs2}=4 and 5, respectively,

If the Q extension is implemented, FROUND.Q and FROUNDNX.Q instructions are
analogously defined to operate on quad-precision numbers.
They are encoded like FCVT.Q.S, but with {\em rs2}=4 and 5, respectively,

\begin{commentary}
The FROUNDNX.{\em fmt} instructions implement the IEEE 754-2019
roundToIntegralExact operation, and the FROUND.{\em fmt} instructions
implement the other operations in the roundToIntegral family.
\end{commentary}


\section{Modular Convert-to-Integer Instruction}

The FCVTMOD.W.D instruction is defined similarly to the FCVT.W.D
instruction, with the following differences.
FCVTMOD.W.D always rounds towards zero.
Bits 31:0 are taken from the rounded, unbounded two's complement result,
then sign-extended to XLEN bits and written to integer register {\em rd}.
$\pm\infty$ and NaN are converted to zero.

Floating-point exception flags are raised the same as they would be for
FCVT.W.D with the same input operand.

This instruction is only provided if the D extension is implemented.
It is encoded like FCVT.W.D, but with the {\rm rs2} field set to 8
and the {\em rm} field set to 1 (RTZ).
Other {\em rm} values are {\em reserved}.

\begin{commentary}
The assembly syntax requires the RTZ rounding mode to be explicitly
specified, i.e., {\tt fcvtmod.w.d rd, rs1, rtz}.
\end{commentary}

\begin{commentary}
The FCVTMOD.W.D instruction was added principally to accelerate the
processing of JavaScript {\tt Number}s.
{\tt Number}s are double-precision values, but some operators implicitly
truncate them to signed integers mod $2^{32}$.
\end{commentary}


\section{Move Instructions}

For RV32 only, if the D extension is implemented,
the FMVH.X.D instruction moves bits 63:32 of floating-point register {\em rs1}
into integer register {\em rd}.
It is encoded in the OP-FP major opcode with {\em funct3}=0, {\em rs2}=1,
and {\em funct7}=1110001.

\begin{commentary}
FMVH.X.D is used in conjunction with the existing FMV.X.W instruction to move
a double-precision floating-point number to a pair of x-registers.
\end{commentary}

For RV32 only, if the D extension is implemented,
the FMVP.D.X instruction moves a double-precision number from a pair of integer
registers into a floating-point register.  Integer registers {\em rs1} and
{\em rs2} supply bits 31:0 and 63:32, respectively; the result is written to
floating-point register {\em rd}.
FMVP.D.X is encoded in the OP-FP major opcode with {\em funct3}=0
and {\em funct7}=1011001.

For RV64 only, if the Q extension is implemented,
the FMVH.X.Q instruction moves bits 127:64 of floating-point register {\em rs1}
into integer register {\em rd}.
It is encoded in the OP-FP major opcode with {\em funct3}=0, {\em rs2}=1,
and {\em funct7}=1110011.

\begin{commentary}
FMVH.X.Q is used in conjunction with the existing FMV.X.D instruction to move
a quad-precision floating-point number to a pair of x-registers.
\end{commentary}

For RV64 only, if the Q extension is implemented,
the FMVP.Q.X instruction moves a double-precision number from a pair of integer
registers into a floating-point register.  Integer registers {\em rs1} and
{\em rs2} supply bits 63:0 and 127:64, respectively; the result is written to
floating-point register {\em rd}.
FMVP.Q.X is encoded in the OP-FP major opcode with {\em funct3}=0
and {\em funct7}=1011011.


\section{Comparison Instructions}

The FLEQ.S and FLTQ.S instructions are defined like the FLE.S and FLT.S
instructions, except that quiet NaN inputs do not cause the invalid
operation exception flag to be set.

If the D extension is implemented, FLEQ.D and FLTQ.D instructions are
analogously defined to operate on double-precision numbers.

If the Zfh extension is implemented, FLEQ.H and FLTQ.H instructions are
analogously defined to operate on half-precision numbers.

If the Q extension is implemented, FLEQ.Q and FLTQ.Q instructions are
analogously defined to operate on quad-precision numbers.

These instructions are encoded like their FLE and FLT counterparts, but
with instruction bit 14 set to 1.

\begin{commentary}
We do not expect analogous comparison instructions will be added to the vector
ISA, since they can be reasonably efficiently emulated using masking.
\end{commentary}