Finalized documentation for TestFloat Release 3.

author: John Hauser <jhauser@eecs.berkeley.edu> 2014-12-17 19:09:39 -0800
committer: John Hauser <jhauser@eecs.berkeley.edu> 2014-12-17 19:09:39 -0800
commit: cec54960bbbfa351cab7dab75eb1418585e4fe64 (patch)
tree: 8c606f0c513bc0ef9582795bd159be8dcffaf565 /doc/TestFloat-general.html
parent: 86cdc156a7c1bb471c11b14d65b9d2b48b714935 (diff)
download: berkeley-testfloat-3-cec54960bbbfa351cab7dab75eb1418585e4fe64.zip
berkeley-testfloat-3-cec54960bbbfa351cab7dab75eb1418585e4fe64.tar.gz
berkeley-testfloat-3-cec54960bbbfa351cab7dab75eb1418585e4fe64.tar.bz2
1 files changed, 305 insertions, 202 deletions
diff --git a/doc/TestFloat-general.html b/doc/TestFloat-general.html
index 1618d4a..d72807e 100644
--- a/doc/TestFloat-general.html
+++ b/doc/TestFloat-general.html
@@ -11,49 +11,38 @@
 
 <P>
 John R. Hauser<BR>
-2014 ______<BR>
-</P>
-
-<P>
-*** CONTENT DONE.
-</P>
-
-<P>
-*** REPLACE QUOTATION MARKS.
-<BR>
-*** REPLACE APOSTROPHES.
-<BR>
-*** REPLACE EM DASH.
+2014 Dec 17<BR>
 </P>
 
 
 <H2>Contents</H2>
 
-<P>
-*** CHECK.<BR>
-*** FIX FORMATTING.
-</P>
-
-<PRE>
-    Introduction
-    Limitations
-    Acknowledgments and License
-    What TestFloat Does
-    Executing TestFloat
-    Operations Tested by TestFloat
-        Conversion Operations
-        Basic Arithmetic Operations
-        Fused Multiply-Add Operations
-        Remainder Operations
-        Round-to-Integer Operations
-        Comparison Operations
-    Interpreting TestFloat Output
-    Variations Allowed by the IEEE Floating-Point Standard
-        Underflow
-        NaNs
-        Conversions to Integer
-    Contact Information
-</PRE>
+<BLOCKQUOTE>
+<TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0>
+<COL WIDTH=25>
+<COL WIDTH=*>
+<TR><TD COLSPAN=2>1. Introduction</TD></TR>
+<TR><TD COLSPAN=2>2. Limitations</TD></TR>
+<TR><TD COLSPAN=2>3. Acknowledgments and License</TD></TR>
+<TR><TD COLSPAN=2>4. What TestFloat Does</TD></TR>
+<TR><TD COLSPAN=2>5. Executing TestFloat</TD></TR>
+<TR><TD COLSPAN=2>6. Operations Tested by TestFloat</TD></TR>
+<TR><TD></TD><TD>6.1. Conversion Operations</TD></TR>
+<TR><TD></TD><TD>6.2. Basic Arithmetic Operations</TD></TR>
+<TR><TD></TD><TD>6.3. Fused Multiply-Add Operations</TD></TR>
+<TR><TD></TD><TD>6.4. Remainder Operations</TD></TR>
+<TR><TD></TD><TD>6.5. Round-to-Integer Operations</TD></TR>
+<TR><TD></TD><TD>6.6. Comparison Operations</TD></TR>
+<TR><TD COLSPAN=2>7. Interpreting TestFloat Output</TD></TR>
+<TR>
+  <TD COLSPAN=2>8. Variations Allowed by the IEEE Floating-Point Standard</TD>
+</TR>
+<TR><TD></TD><TD>8.1. Underflow</TD></TR>
+<TR><TD></TD><TD>8.2. NaNs</TD></TR>
+<TR><TD></TD><TD>8.3. Conversions to Integer</TD></TR>
+<TR><TD COLSPAN=2>9. Contact Information</TD></TR>
+</TABLE>
+</BLOCKQUOTE>
 
 
 <H2>1. Introduction</H2>
@@ -89,8 +78,8 @@ Details about the standard are available elsewhere.
 
 <P>
 The current version of TestFloat is <NOBR>Release 3</NOBR>.
-The set of TestFloat programs as well as the programs' arguments and behavior
-have changed some compared to earlier TestFloat releases.
+The set of TestFloat programs as well as the programs&rsquo; arguments and
+behavior have changed some compared to earlier TestFloat releases.
 </P>
 
 
@@ -119,15 +108,20 @@ bugs can be found through links posted on the TestFloat Web page
 The TestFloat package was written by me, <NOBR>John R.</NOBR> Hauser.
 <NOBR>Release 3</NOBR> of TestFloat is a completely new implementation
 supplanting earlier releases.
-This project was done in the employ of the University of California, Berkeley,
-within the Department of Electrical Engineering and Computer Sciences, first
-for the Parallel Computing Laboratory (Par Lab) and then for the ASPIRE Lab.
+This project (<NOBR>Release 3</NOBR> only, not earlier releases) was done in
+the employ of the University of California, Berkeley, within the Department of
+Electrical Engineering and Computer Sciences, first for the Parallel Computing
+Laboratory (Par Lab) and then for the ASPIRE Lab.
 The work was officially overseen by Prof. Krste Asanovic, with funding provided
 by these sources:
 <BLOCKQUOTE>
 <TABLE>
+<COL WIDTH=*>
+<COL WIDTH=10>
+<COL WIDTH=*>
 <TR>
-<TD><NOBR>Par Lab:</NOBR></TD>
+<TD VALIGN=TOP><NOBR>Par Lab:</NOBR></TD>
+<TD></TD>
 <TD>
 Microsoft (Award #024263), Intel (Award #024894), and U.C. Discovery
 (Award #DIG07-10227), with additional support from Par Lab affiliates Nokia,
@@ -135,7 +129,8 @@ NVIDIA, Oracle, and Samsung.
 </TD>
 </TR>
 <TR>
-<TD><NOBR>ASPIRE Lab:</NOBR></TD>
+<TD VALIGN=TOP><NOBR>ASPIRE Lab:</NOBR></TD>
+<TD></TD>
 <TD>
 DARPA PERFECT program (Award #HR0011-12-2-0016), with additional support from
 ASPIRE industrial sponsor Intel and ASPIRE affiliates Google, Nokia, NVIDIA,
@@ -191,8 +186,8 @@ ENHANCEMENTS, OR MODIFICATIONS.
 
 <P>
 TestFloat is designed to test a floating-point implementation by comparing its
-behavior with that of TestFloat's own internal floating-point implemented in
-software.
+behavior with that of TestFloat&rsquo;s own internal floating-point implemented
+in software.
 For each operation to be tested, the TestFloat programs can generate a large
 number of test cases, made up of simple pattern tests intermixed with weighted
 random inputs.
@@ -263,19 +258,20 @@ for programs <CODE>testfloat_ver</CODE> and <CODE>testfloat</CODE>.
 TestFloat normally compares an implementation of floating-point against the
 Berkeley SoftFloat software implementation of floating-point, also created by
 me.
-The SoftFloat functions are linked into each TestFloat program's executable.
+The SoftFloat functions are linked into each TestFloat program&rsquo;s
+executable.
 Information about SoftFloat can be found at the Web page
 <A HREF="http://www.jhauser.us/arithmetic/SoftFloat.html"><CODE>http://www.jhauser.us/arithmetic/SoftFloat.html</CODE></A>.
 </P>
 
 <P>
 For testing SoftFloat itself, the TestFloat package includes a
-<CODE>testsoftfloat</CODE> program that compares SoftFloat's floating-point
-against <EM>another</EM> software floating-point implementation.
+<CODE>testsoftfloat</CODE> program that compares SoftFloat&rsquo;s
+floating-point against <EM>another</EM> software floating-point implementation.
 The second software floating-point is simpler and slower than SoftFloat, and is
 completely independent of SoftFloat.
 Although the second software floating-point cannot be guaranteed to be
-bug-free, the chance that it would mimic any of SoftFloat's bugs is low.
+bug-free, the chance that it would mimic any of SoftFloat&rsquo;s bugs is low.
 Consequently, an error in one or the other floating-point version should appear
 as an unexpected difference between the two implementations.
 Note that testing SoftFloat should be necessary only when compiling a new
@@ -347,9 +343,11 @@ These results can then be piped to <CODE>testfloat_ver</CODE> to be checked for
 correctness.
 Assuming a vertical bar (<CODE>|</CODE>) indicates a pipe between programs, the
 complete process could be written as a single command like so:
+<BLOCKQUOTE>
 <PRE>
-     testfloat_gen ... &lt;type&gt; | &lt;program-that-invokes-op&gt; | testfloat_ver ... &lt;function&gt;
+testfloat_gen ... &lt;type&gt; | &lt;program-that-invokes-op&gt; | testfloat_ver ... &lt;function&gt;
 </PRE>
+</BLOCKQUOTE>
 The program in the middle is not supplied by TestFloat but must be created
 independently.
 If for some reason this program cannot take command-line arguments, the
@@ -363,9 +361,11 @@ A second method for running TestFloat is similar but has
 expected results for each case.
 With this additional information, the job done by <CODE>testfloat_ver</CODE>
 can be folded into the invoking program to give the following command:
+<BLOCKQUOTE>
 <PRE>
-     testfloat_gen ... &lt;function&gt; | &lt;program-that-invokes-op-and-compares-results&gt;
+testfloat_gen ... &lt;function&gt; | &lt;program-that-invokes-op-and-compares-results&gt;
 </PRE>
+</BLOCKQUOTE>
 Again, the program that actually invokes the floating-point operation is not
 supplied by TestFloat but must be created independently.
 Depending on circumstance, it may be preferable either to let
@@ -429,8 +429,8 @@ multiplication, division, and square root operations;
 for each format, the floating-point remainder operation defined by the IEEE
 Standard;
 <LI>
-for each format, a ``round to integer'' operation that rounds to the nearest
-integer value in the same format; and
+for each format, a &ldquo;round to integer&rdquo; operation that rounds to the
+nearest integer value in the same format; and
 <LI>
 comparisons between two values in the same floating-point format.
 </UL>
@@ -451,8 +451,8 @@ is called <CODE>f32</CODE>, <NOBR>64-bit</NOBR> double-precision is
 <CODE>extF80</CODE>, and <NOBR>128-bit</NOBR> quadruple-precision is
 <CODE>f128</CODE>.
 TestFloat generally uses the same names for operations as Berkeley SoftFloat,
-except that TestFloat's names never include the <CODE>M</CODE> that SoftFloat
-uses to indicate that values are passed through pointers.
+except that TestFloat&rsquo;s names never include the <CODE>M</CODE> that
+SoftFloat uses to indicate that values are passed through pointers.
 </P>
 
 <H3>6.1. Conversion Operations</H3>
@@ -462,21 +462,23 @@ All conversions among the floating-point formats and all conversions between a
 floating-point format and <NOBR>32-bit</NOBR> and <NOBR>64-bit</NOBR> integers
 can be tested.
 The conversion operations are:
+<BLOCKQUOTE>
 <PRE>
-     ui32_to_f32      ui64_to_f32      i32_to_f32       i64_to_f32
-     ui32_to_f64      ui64_to_f64      i32_to_f64       i64_to_f64
-     ui32_to_extF80   ui64_to_extF80   i32_to_extF80    i64_to_extF80
-     ui32_to_f128     ui64_to_f128     i32_to_f128      i64_to_f128
-
-     f32_to_ui32      f64_to_ui32      extF80_to_ui32   f128_to_ui32
-     f32_to_ui64      f64_to_ui64      extF80_to_ui64   f128_to_ui64
-     f32_to_i32       f64_to_i32       extF80_to_i32    f128_to_i32
-     f32_to_i64       f64_to_i64       extF80_to_i64    f128_to_i64
-
-     f32_to_f64       f64_to_f32       extF80_to_f32    f128_to_f32
-     f32_to_extF80    f64_to_extF80    extF80_to_f64    f128_to_f64
-     f32_to_f128      f64_to_f128      extF80_to_f128   f128_to_extF80
+ui32_to_f32      ui64_to_f32      i32_to_f32       i64_to_f32
+ui32_to_f64      ui64_to_f64      i32_to_f64       i64_to_f64
+ui32_to_extF80   ui64_to_extF80   i32_to_extF80    i64_to_extF80
+ui32_to_f128     ui64_to_f128     i32_to_f128      i64_to_f128
+
+f32_to_ui32      f64_to_ui32      extF80_to_ui32   f128_to_ui32
+f32_to_ui64      f64_to_ui64      extF80_to_ui64   f128_to_ui64
+f32_to_i32       f64_to_i32       extF80_to_i32    f128_to_i32
+f32_to_i64       f64_to_i64       extF80_to_i64    f128_to_i64
+
+f32_to_f64       f64_to_f32       extF80_to_f32    f128_to_f32
+f32_to_extF80    f64_to_extF80    extF80_to_f64    f128_to_f64
+f32_to_f128      f64_to_f128      extF80_to_f128   f128_to_extF80
 </PRE>
+</BLOCKQUOTE>
 Abbreviations <CODE>ui32</CODE> and <CODE>ui64</CODE> indicate
 <NOBR>32-bit</NOBR> and <NOBR>64-bit</NOBR> unsigned integer types, while
 <CODE>i32</CODE> and <CODE>i64</CODE> indicate their signed counterparts.
@@ -495,22 +497,27 @@ operations requires amendment.
 For <CODE>testfloat</CODE> only, conversions to an integer type have names that
 explicitly specify the rounding mode and treatment of inexactness.
 Thus, instead of
+<BLOCKQUOTE>
 <PRE>
-     &lt;float&gt;_to_&lt;int&gt;
+&lt;float&gt;_to_&lt;int&gt;
 </PRE>
+</BLOCKQUOTE>
 as listed above, operations converting to integer type have names of these
 forms:
+<BLOCKQUOTE>
 <PRE>
-     &lt;float&gt;_to_&lt;int&gt;_r_&lt;round&gt;
-     &lt;float&gt;_to_&lt;int&gt;_rx_&lt;round&gt;
+&lt;float&gt;_to_&lt;int&gt;_r_&lt;round&gt;
+&lt;float&gt;_to_&lt;int&gt;_rx_&lt;round&gt;
 </PRE>
-The <CODE>&lt;round&gt;</CODE> component is one of `<CODE>near_even</CODE>',
-`<CODE>near_maxMag</CODE>', `<CODE>minMag</CODE>', `<CODE>min</CODE>', or
-`<CODE>max</CODE>', choosing the rounding mode.
+</BLOCKQUOTE>
+The <CODE>&lt;round&gt;</CODE> component is one of
+&lsquo;<CODE>near_even</CODE>&rsquo;, &lsquo;<CODE>near_maxMag</CODE>&rsquo;,
+&lsquo;<CODE>minMag</CODE>&rsquo;, &lsquo;<CODE>min</CODE>&rsquo;, or
+&lsquo;<CODE>max</CODE>&rsquo;, choosing the rounding mode.
 Any other indication of rounding mode is ignored.
-The operations with `<CODE>_r_</CODE>' in their names never raise the
-<I>inexact</I> exception, while those with `<CODE>_rx_</CODE>' raise the
-<I>inexact</I> exception whenever the result is not exact.
+The operations with &lsquo;<CODE>_r_</CODE>&rsquo; in their names never raise
+the <I>inexact</I> exception, while those with &lsquo;<CODE>_rx_</CODE>&rsquo;
+raise the <I>inexact</I> exception whenever the result is not exact.
 </P>
 
 <P>
@@ -518,7 +525,8 @@ TestFloat assumes that conversions from floating-point to an integer type
 should raise the <I>invalid</I> exception if the input cannot be rounded to an
 integer representable by the result format.
 In such a circumstance, if the result type is an unsigned integer, TestFloat
-expects the result of the operation to be the type's largest integer value.
+expects the result of the operation to be the type&rsquo;s largest integer
+value.
 If the result type is a signed integer and conversion overflows, TestFloat
 expects the result to be the largest-magnitude integer with the same sign as
 the input.
@@ -533,12 +541,14 @@ exception.
 
 <P>
 The following standard arithmetic operations can be tested:
+<BLOCKQUOTE>
 <PRE>
-     f32_add      f32_sub      f32_mul      f32_div      f32_sqrt
-     f64_add      f64_sub      f64_mul      f64_div      f64_sqrt
-     extF80_add   extF80_sub   extF80_mul   extF80_div   extF80_sqrt
-     f128_add     f128_sub     f128_mul     f128_div     f128_sqrt
+f32_add      f32_sub      f32_mul      f32_div      f32_sqrt
+f64_add      f64_sub      f64_mul      f64_div      f64_sqrt
+extF80_add   extF80_sub   extF80_mul   extF80_div   extF80_sqrt
+f128_add     f128_sub     f128_mul     f128_div     f128_sqrt
 </PRE>
+</BLOCKQUOTE>
 The double-extended-precision (<CODE>extF80</CODE>) operations can be rounded
 to reduced precision under rounding precision control.
 </P>
@@ -550,11 +560,13 @@ For all floating-point formats except <NOBR>80-bit</NOBR>
 double-extended-precision, TestFloat can test the fused multiply-add operation
 defined by the 2008 IEEE Floating-Point Standard.
 The fused multiply-add operations are:
+<BLOCKQUOTE>
 <PRE>
-     f32_mulAdd
-     f64_mulAdd
-     f128_mulAdd
+f32_mulAdd
+f64_mulAdd
+f128_mulAdd
 </PRE>
+</BLOCKQUOTE>
 </P>
 
 <P>
@@ -566,29 +578,34 @@ exception even if the third operand is a NaN.
 <H3>6.4. Remainder Operations</H3>
 
 <P>
-For each format, TestFloat can test the IEEE Standard's remainder operation.
+For each format, TestFloat can test the IEEE Standard&rsquo;s remainder
+operation.
 These operations are:
+<BLOCKQUOTE>
 <PRE>
-     f32_rem
-     f64_rem
-     extF80_rem
-     f128_rem
+f32_rem
+f64_rem
+extF80_rem
+f128_rem
 </PRE>
+</BLOCKQUOTE>
 The remainder operations are always exact and so require no rounding.
 </P>
 
 <H3>6.5. Round-to-Integer Operations</H3>
 
 <P>
-For each format, TestFloat can test the IEEE Standard's round-to-integer
+For each format, TestFloat can test the IEEE Standard&rsquo;s round-to-integer
 operation.
 For most TestFloat programs, these operations are:
+<BLOCKQUOTE>
 <PRE>
-     f32_roundToInt
-     f64_roundToInt
-     extF80_roundToInt
-     f128_roundToInt
+f32_roundToInt
+f64_roundToInt
+extF80_roundToInt
+f128_roundToInt
 </PRE>
+</BLOCKQUOTE>
 </P>
 
 <P>
@@ -596,35 +613,40 @@ Just as for conversions to integer types (<NOBR>section 6.1</NOBR> above), the
 all-in-one <CODE>testfloat</CODE> program is again an exception.
 For <CODE>testfloat</CODE> only, the round-to-integer operations have names of
 these forms:
+<BLOCKQUOTE>
 <PRE>
-     &lt;float&gt;_roundToInt_r_&lt;round&gt;
-     &lt;float&gt;_roundToInt_x
+&lt;float&gt;_roundToInt_r_&lt;round&gt;
+&lt;float&gt;_roundToInt_x
 </PRE>
-For the `<CODE>_r_</CODE>' versions, the <I>inexact</I> exception is never
-raised, and the <CODE>&lt;round&gt;</CODE> component specifies the rounding
-mode as one of `<CODE>near_even</CODE>', `<CODE>near_maxMag</CODE>',
-`<CODE>minMag</CODE>', `<CODE>min</CODE>', or `<CODE>max</CODE>'.
+</BLOCKQUOTE>
+For the &lsquo;<CODE>_r_</CODE>&rsquo; versions, the <I>inexact</I> exception
+is never raised, and the <CODE>&lt;round&gt;</CODE> component specifies the
+rounding mode as one of &lsquo;<CODE>near_even</CODE>&rsquo;,
+&lsquo;<CODE>near_maxMag</CODE>&rsquo;, &lsquo;<CODE>minMag</CODE>&rsquo;,
+&lsquo;<CODE>min</CODE>&rsquo;, or &lsquo;<CODE>max</CODE>&rsquo;.
 The usual indication of rounding mode is ignored.
-In contrast, the `<CODE>_x</CODE>' versions accept the usual indication of
-rounding mode and raise the <I>inexact</I> exception whenever the result is not
-exact.
-This irregular system follows the IEEE Standard's precise specification for the
-round-to-integer operations.
+In contrast, the &lsquo;<CODE>_x</CODE>&rsquo; versions accept the usual
+indication of rounding mode and raise the <I>inexact</I> exception whenever the
+result is not exact.
+This irregular system follows the IEEE Standard&rsquo;s precise specification
+for the round-to-integer operations.
 </P>
 
 <H3>6.6. Comparison Operations</H3>
 
 <P>
 The following floating-point comparison operations can be tested:
+<BLOCKQUOTE>
 <PRE>
-     f32_eq      f32_le      f32_lt
-     f64_eq      f64_le      f64_lt
-     extF80_eq   extF80_le   extF80_lt
-     f128_eq     f128_le     f128_lt
+f32_eq      f32_le      f32_lt
+f64_eq      f64_le      f64_lt
+extF80_eq   extF80_le   extF80_lt
+f128_eq     f128_le     f128_lt
 </PRE>
-The abbreviation <CODE>eq</CODE> stands for ``equal'' (=), <CODE>le</CODE>
-stands for ``less than or equal'' (&le;), and <CODE>lt</CODE> stands for
-``less than'' (&lt;).
+</BLOCKQUOTE>
+The abbreviation <CODE>eq</CODE> stands for &ldquo;equal&rdquo; (=),
+<CODE>le</CODE> stands for &ldquo;less than or equal&rdquo; (&le;), and
+<CODE>lt</CODE> stands for &ldquo;less than&rdquo; (&lt;).
 </P>
 
 <P>
@@ -635,12 +657,14 @@ The equality comparisons, on the other hand, are defined by default to raise
 the <I>invalid</I> exception only for signaling NaNs, not for quiet NaNs.
 For completeness, the following additional operations can be tested if
 supported:
+<BLOCKQUOTE>
 <PRE>
-     f32_eq_signaling      f32_le_quiet      f32_lt_quiet
-     f64_eq_signaling      f64_le_quiet      f64_lt_quiet
-     extF80_eq_signaling   extF80_le_quiet   extF80_lt_quiet
-     f128_eq_signaling     f128_le_quiet     f128_lt_quiet
+f32_eq_signaling      f32_le_quiet      f32_lt_quiet
+f64_eq_signaling      f64_le_quiet      f64_lt_quiet
+extF80_eq_signaling   extF80_le_quiet   extF80_lt_quiet
+f128_eq_signaling     f128_le_quiet     f128_lt_quiet
 </PRE>
+</BLOCKQUOTE>
 The <CODE>signaling</CODE> equality comparisons are identical to the standard
 operations except that the <I>invalid</I> exception should be raised for any
 NaN input.
@@ -658,8 +682,8 @@ Any rounding mode is ignored.
 <H2>7. Interpreting TestFloat Output</H2>
 
 <P>
-The ``errors'' reported by TestFloat programs may or may not really represent
-errors in the system being tested.
+The &ldquo;errors&rdquo; reported by TestFloat programs may or may not really
+represent errors in the system being tested.
 For each test case tried, the results from the floating-point implementation
 being tested could differ from the expected results for several reasons:
 <UL>
@@ -694,14 +718,16 @@ For each reported error (or apparent error), a line of text is written to the
 default output.
 If a line would be longer than 79 characters, it is divided.
 The first part of each error line begins in the leftmost column, and any
-subsequent ``continuation'' lines are indented with a tab.
+subsequent &ldquo;continuation&rdquo; lines are indented with a tab.
 </P>
 
 <P>
 Each error reported is of the form:
+<BLOCKQUOTE>
 <PRE>
-     &lt;inputs&gt;  => &lt;observed-output&gt;  expected: &lt;expected-output&gt;
+&lt;inputs&gt;  => &lt;observed-output&gt;  expected: &lt;expected-output&gt;
 </PRE>
+</BLOCKQUOTE>
 The <CODE>&lt;inputs&gt;</CODE> are the inputs to the operation.
 Each output (observed and expected) is shown as a pair:  the result value
 first, followed by the exception flags.
@@ -709,10 +735,12 @@ first, followed by the exception flags.
 
 <P>
 For example, two typical error lines could be
+<BLOCKQUOTE>
 <PRE>
-     800.7FFF00  87F.000100  => 001.000000 ...ux  expected: 001.000000 ....x
-     081.000004  000.1FFFFF  => 001.000000 ...ux  expected: 001.000000 ....x
+800.7FFF00  87F.000100  => 001.000000 ...ux  expected: 001.000000 ....x
+081.000004  000.1FFFFF  => 001.000000 ...ux  expected: 001.000000 ....x
 </PRE>
+</BLOCKQUOTE>
 In the first line, the inputs are <CODE>800.7FFF00</CODE> and
 <CODE>87F.000100</CODE>, and the observed result is <CODE>001.000000</CODE>
 with flags <CODE>...ux</CODE>.
@@ -732,8 +760,9 @@ Four are floating-point types:  <NOBR>32-bit</NOBR> single-precision,
 <NOBR>64-bit</NOBR> double-precision, <NOBR>80-bit</NOBR>
 double-extended-precision, and <NOBR>128-bit</NOBR> quadruple-precision.
 The remaining five types are <NOBR>32-bit</NOBR> and <NOBR>64-bit</NOBR>
-unsigned integers, <NOBR>32-bit</NOBR> and <NOBR>64-bit</NOBR> two's-complement
-signed integers, and Boolean values (the results of comparison operations).
+unsigned integers, <NOBR>32-bit</NOBR> and <NOBR>64-bit</NOBR>
+two&rsquo;s-complement signed integers, and Boolean values (the results of
+comparison operations).
 Boolean values are represented as a single character, either a <CODE>0</CODE>
 or a <CODE>1</CODE>.
 <NOBR>32-bit</NOBR> integers are represented as 8 hexadecimal digits.
@@ -749,47 +778,93 @@ hexadecimal digits that give the raw bits of the floating-point encoding.
 A period separates the 3rd and 4th hexadecimal digits to mark the division
 between the exponent bits and fraction bits.
 Some notable <NOBR>64-bit</NOBR> double-precision values include:
-<PRE>
-     000.0000000000000    +0
-     3FF.0000000000000     1
-     400.0000000000000     2
-     7FF.0000000000000    +infinity
-
-     800.0000000000000    -0
-     BFF.0000000000000    -1
-     C00.0000000000000    -2
-     FFF.0000000000000    -infinity
-
-     3FE.FFFFFFFFFFFFF    largest representable number less than +1
-</PRE>
+<BLOCKQUOTE>
+<TABLE CELLSPACING=0 CELLPADDING=0>
+<TR>
+  <TD><CODE>000.0000000000000&nbsp;&nbsp;&nbsp;&nbsp;</CODE></TD>
+  <TD>+0</TD>
+</TR>
+<TR><TD><CODE>3FF.0000000000000</CODE></TD><TD>&nbsp;1</TD></TR>
+<TR><TD><CODE>400.0000000000000</CODE></TD><TD>&nbsp;2</TD></TR>
+<TR><TD><CODE>7FF.0000000000000</CODE></TD><TD>+infinity</TD></TR>
+<TR><TD>&nbsp;</TD></TR>
+<TR><TD><CODE>800.0000000000000</CODE></TD><TD>&minus;0</TD></TR>
+<TR><TD><CODE>BFF.0000000000000</CODE></TD><TD>&minus;1</TD></TR>
+<TR><TD><CODE>C00.0000000000000</CODE></TD><TD>&minus;2</TD></TR>
+<TR><TD><CODE>FFF.0000000000000</CODE></TD><TD>&minus;infinity</TD></TR>
+<TR><TD>&nbsp;</TD></TR>
+<TR>
+  <TD><CODE>3FE.FFFFFFFFFFFFF</CODE></TD>
+  <TD>largest representable number less than +1</TD>
+</TR>
+</TABLE>
+</BLOCKQUOTE>
 The following categories are easily distinguished (assuming the
 <CODE>x</CODE>s are not all 0):
-<PRE>
-     000.xxxxxxxxxxxxx    positive subnormal (denormalized) numbers
-     7FF.xxxxxxxxxxxxx    positive NaNs
-     800.xxxxxxxxxxxxx    negative subnormal numbers
-     FFF.xxxxxxxxxxxxx    negative NaNs
-</PRE>
+<BLOCKQUOTE>
+<TABLE CELLSPACING=0 CELLPADDING=0>
+<TR>
+  <TD><CODE>000.xxxxxxxxxxxxx&nbsp;&nbsp;&nbsp;&nbsp;</CODE></TD>
+  <TD>positive subnormal (denormalized) numbers</TD>
+</TR>
+<TR><TD><CODE>7FF.xxxxxxxxxxxxx</CODE></TD><TD>positive NaNs</TD></TR>
+<TR>
+  <TD><CODE>800.xxxxxxxxxxxxx</CODE></TD>
+  <TD>negative subnormal numbers</TD>
+</TR>
+<TR><TD><CODE>FFF.xxxxxxxxxxxxx</CODE></TD><TD>negative NaNs</TD></TR>
+</TABLE>
+</BLOCKQUOTE>
 </P>
 
 <P>
 <NOBR>128-bit</NOBR> quadruple-precision values are written the same except
 with 4 hexadecimal digits for the sign and exponent and 28 for the fraction.
 Notable values include:
-<PRE>
-     0000.0000000000000000000000000000    +0
-     3FFF.0000000000000000000000000000     1
-     4000.0000000000000000000000000000     2
-     7FFF.0000000000000000000000000000    +infinity
-
-     8000.0000000000000000000000000000    -0
-     BFFF.0000000000000000000000000000    -1
-     C000.0000000000000000000000000000    -2
-     FFFF.0000000000000000000000000000    -infinity
-
-     3FFE.FFFFFFFFFFFFFFFFFFFFFFFFFFFF    largest representable number
-                                              less than +1
-</PRE>
+<BLOCKQUOTE>
+<TABLE CELLSPACING=0 CELLPADDING=0>
+<TR>
+  <TD>
+    <CODE>0000.0000000000000000000000000000&nbsp;&nbsp;&nbsp;&nbsp;</CODE>
+  </TD>
+  <TD>+0</TD>
+</TR>
+<TR>
+  <TD><CODE>3FFF.0000000000000000000000000000</CODE></TD>
+  <TD>&nbsp;1</TD>
+</TR>
+<TR>
+  <TD><CODE>4000.0000000000000000000000000000</CODE></TD>
+  <TD>&nbsp;2</TD>
+</TR>
+<TR>
+  <TD><CODE>7FFF.0000000000000000000000000000</CODE></TD>
+  <TD>+infinity</TD>
+</TR>
+<TR><TD>&nbsp;</TD></TR>
+<TR>
+  <TD><CODE>8000.0000000000000000000000000000</CODE></TD>
+  <TD>&minus;0</TD>
+</TR>
+<TR>
+  <TD><CODE>BFFF.0000000000000000000000000000</CODE></TD>
+  <TD>&minus;1</TD>
+</TR>
+<TR>
+  <TD><CODE>C000.0000000000000000000000000000</CODE></TD>
+  <TD>&minus;2</TD>
+</TR>
+<TR>
+  <TD><CODE>FFFF.0000000000000000000000000000</CODE></TD>
+  <TD>&minus;infinity</TD>
+</TR>
+<TR><TD>&nbsp;</TD></TR>
+<TR>
+  <TD><CODE>3FFE.FFFFFFFFFFFFFFFFFFFFFFFFFFFF</CODE></TD>
+  <TD>largest representable number less than +1</TD>
+</TR>
+</TABLE>
+</BLOCKQUOTE>
 </P>
 
 <P>
@@ -801,19 +876,27 @@ and will be 1 otherwise.
 Hence, the same values listed above appear in <NOBR>80-bit</NOBR>
 double-extended-precision as follows (note the leading <CODE>8</CODE> digit in
 the significands):
-<PRE>
-     0000.0000000000000000    +0
-     3FFF.8000000000000000     1
-     4000.8000000000000000     2
-     7FFF.8000000000000000    +infinity
-
-     8000.0000000000000000    -0
-     BFFF.8000000000000000    -1
-     C000.8000000000000000    -2
-     FFFF.8000000000000000    -infinity
-
-     3FFE.FFFFFFFFFFFFFFFF    largest representable number less than +1
-</PRE>
+<BLOCKQUOTE>
+<TABLE CELLSPACING=0 CELLPADDING=0>
+<TR>
+  <TD><CODE>0000.0000000000000000&nbsp;&nbsp;&nbsp;&nbsp;</CODE></TD>
+  <TD>+0</TD>
+</TR>
+<TR><TD><CODE>3FFF.8000000000000000</CODE></TD><TD>&nbsp;1</TD></TR>
+<TR><TD><CODE>4000.8000000000000000</CODE></TD><TD>&nbsp;2</TD></TR>
+<TR><TD><CODE>7FFF.8000000000000000</CODE></TD><TD>+infinity</TD></TR>
+<TR><TD>&nbsp;</TD></TR>
+<TR><TD><CODE>8000.0000000000000000</CODE></TD><TD>&minus;0</TD></TR>
+<TR><TD><CODE>BFFF.8000000000000000</CODE></TD><TD>&minus;1</TD></TR>
+<TR><TD><CODE>C000.8000000000000000</CODE></TD><TD>&minus;2</TD></TR>
+<TR><TD><CODE>FFFF.8000000000000000</CODE></TD><TD>&minus;infinity</TD></TR>
+<TR><TD>&nbsp;</TD></TR>
+<TR>
+  <TD><CODE>3FFE.FFFFFFFFFFFFFFFF</CODE></TD>
+  <TD>largest representable number less than +1</TD>
+</TR>
+</TABLE>
+</BLOCKQUOTE>
 </P>
 
 <P>
@@ -826,11 +909,13 @@ These are written as 9 hexadecimal digits, with a period separating the 3rd and
 4th hexadecimal digits.
 Broken out into bits, the 9 hexademical digits cover the <NOBR>32-bit</NOBR>
 single-precision subfields as follows:
+<BLOCKQUOTE>
 <PRE>
-     x000 .... ....  .  .... .... .... .... .... ....    sign       (1 bit)
-     .... xxxx xxxx  .  .... .... .... .... .... ....    exponent   (8 bits)
-     .... .... ....  .  0xxx xxxx xxxx xxxx xxxx xxxx    fraction  (23 bits)
+x000 .... ....  .  .... .... .... .... .... ....    sign       (1 bit)
+.... xxxx xxxx  .  .... .... .... .... .... ....    exponent   (8 bits)
+.... .... ....  .  0xxx xxxx xxxx xxxx xxxx xxxx    fraction  (23 bits)
 </PRE>
+</BLOCKQUOTE>
 As shown in this schematic, the first hexadecimal digit contains only the sign,
 and will be either <CODE>0</CODE> <NOBR>or <CODE>8</CODE></NOBR>.
 The next two digits give the biased exponent as an <NOBR>8-bit</NOBR> integer.
@@ -841,27 +926,37 @@ The most significant hexadecimal digit of the fraction can be at most
 
 <P>
 Notable single-precision values include:
-<PRE>
-     000.000000    +0
-     07F.000000     1
-     080.000000     2
-     0FF.000000    +infinity
-
-     800.000000    -0
-     87F.000000    -1
-     880.000000    -2
-     8FF.000000    -infinity
-
-     07E.7FFFFF    largest representable number less than +1
-</PRE>
+<BLOCKQUOTE>
+<TABLE CELLSPACING=0 CELLPADDING=0>
+<TR><TD><CODE>000.000000&nbsp;&nbsp;&nbsp;&nbsp;</CODE></TD><TD>+0</TD></TR>
+<TR><TD><CODE>07F.000000</CODE></TD><TD>&nbsp;1</TD></TR>
+<TR><TD><CODE>080.000000</CODE></TD><TD>&nbsp;2</TD></TR>
+<TR><TD><CODE>0FF.000000</CODE></TD><TD>+infinity</TD></TR>
+<TR><TD>&nbsp;</TD></TR>
+<TR><TD><CODE>800.000000</CODE></TD><TD>&minus;0</TD></TR>
+<TR><TD><CODE>87F.000000</CODE></TD><TD>&minus;1</TD></TR>
+<TR><TD><CODE>880.000000</CODE></TD><TD>&minus;2</TD></TR>
+<TR><TD><CODE>8FF.000000</CODE></TD><TD>&minus;infinity</TD></TR>
+<TR><TD>&nbsp;</TD></TR>
+<TR>
+  <TD><CODE>07E.7FFFFF</CODE></TD>
+  <TD>largest representable number less than +1</TD>
+</TR>
+</TABLE>
+</BLOCKQUOTE>
 Again, certain categories are easily distinguished (assuming the
 <CODE>x</CODE>s are not all 0):
-<PRE>
-     000.xxxxxx    positive subnormal (denormalized) numbers
-     0FF.xxxxxx    positive NaNs
-     800.xxxxxx    negative subnormal numbers
-     8FF.xxxxxx    negative NaNs
-</PRE>
+<BLOCKQUOTE>
+<TABLE CELLSPACING=0 CELLPADDING=0>
+<TR>
+  <TD><CODE>000.xxxxxx&nbsp;&nbsp;&nbsp;&nbsp;</CODE></TD>
+  <TD>positive subnormal (denormalized) numbers</TD>
+</TR>
+<TR><TD><CODE>0FF.xxxxxx</CODE></TD><TD>positive NaNs</TD></TR>
+<TR><TD><CODE>800.xxxxxx</CODE></TD><TD>negative subnormal numbers</TD></TR>
+<TR><TD><CODE>8FF.xxxxxx</CODE></TD><TD>negative NaNs</TD></TR>
+</TABLE>
+</BLOCKQUOTE>
 </P>
 
 <P>
@@ -871,13 +966,21 @@ Each flag is written as either a letter or a period (<CODE>.</CODE>) according
 to whether the flag was set or not by the operation.
 A period indicates the flag was not set.
 The letter used to indicate a set flag depends on the flag:
-<PRE>
-     v    invalid exception
-     i    infinite exception ("divide by zero")
-     o    overflow exception
-     u    underflow exception
-     x    inexact exception
-</PRE>
+<BLOCKQUOTE>
+<TABLE CELLSPACING=0 CELLPADDING=0>
+<TR>
+  <TD><CODE>v&nbsp;&nbsp;&nbsp;&nbsp;</CODE></TD>
+  <TD>invalid exception</TD>
+</TR>
+<TR>
+  <TD><CODE>i</CODE></TD>
+  <TD>infinite exception (&ldquo;divide by zero&rdquo;)</TD>
+</TR>
+<TR><TD><CODE>o</CODE></TD><TD>overflow exception</TD></TR>
+<TR><TD><CODE>u</CODE></TD><TD>underflow exception</TD></TR>
+<TR><TD><CODE>x</CODE></TD><TD>inexact exception</TD></TR>
+</TABLE>
+</BLOCKQUOTE>
 For example, the notation <CODE>...ux</CODE> indicates that the
 <I>underflow</I> and <I>inexact</I> exception flags were set and that the other
 three flags (<I>invalid</I>, <I>infinite</I>, and <I>overflow</I>) were not
author	John Hauser <jhauser@eecs.berkeley.edu>	2014-12-17 19:09:39 -0800
committer	John Hauser <jhauser@eecs.berkeley.edu>	2014-12-17 19:09:39 -0800
commit	cec54960bbbfa351cab7dab75eb1418585e4fe64 (patch)
tree	8c606f0c513bc0ef9582795bd159be8dcffaf565 /doc/TestFloat-general.html
parent	86cdc156a7c1bb471c11b14d65b9d2b48b714935 (diff)
download	berkeley-testfloat-3-cec54960bbbfa351cab7dab75eb1418585e4fe64.zip berkeley-testfloat-3-cec54960bbbfa351cab7dab75eb1418585e4fe64.tar.gz berkeley-testfloat-3-cec54960bbbfa351cab7dab75eb1418585e4fe64.tar.bz2