diff options
Diffstat (limited to 'doc/TestFloat-general.html')
-rw-r--r-- | doc/TestFloat-general.html | 989 |
1 files changed, 989 insertions, 0 deletions
diff --git a/doc/TestFloat-general.html b/doc/TestFloat-general.html new file mode 100644 index 0000000..1618d4a --- /dev/null +++ b/doc/TestFloat-general.html @@ -0,0 +1,989 @@ + +<HTML> + +<HEAD> +<TITLE>Berkeley TestFloat General Documentation</TITLE> +</HEAD> + +<BODY> + +<H1>Berkeley TestFloat Release 3: General Documentation</H1> + +<P> +John R. Hauser<BR> +2014 ______<BR> +</P> + +<P> +*** CONTENT DONE. +</P> + +<P> +*** REPLACE QUOTATION MARKS. +<BR> +*** REPLACE APOSTROPHES. +<BR> +*** REPLACE EM DASH. +</P> + + +<H2>Contents</H2> + +<P> +*** CHECK.<BR> +*** FIX FORMATTING. +</P> + +<PRE> + Introduction + Limitations + Acknowledgments and License + What TestFloat Does + Executing TestFloat + Operations Tested by TestFloat + Conversion Operations + Basic Arithmetic Operations + Fused Multiply-Add Operations + Remainder Operations + Round-to-Integer Operations + Comparison Operations + Interpreting TestFloat Output + Variations Allowed by the IEEE Floating-Point Standard + Underflow + NaNs + Conversions to Integer + Contact Information +</PRE> + + +<H2>1. Introduction</H2> + +<P> +Berkeley TestFloat is a small collection of programs for testing that an +implementation of binary floating-point conforms to the IEEE Standard for +Floating-Point Arithmetic. +All operations required by the original 1985 version of the IEEE Floating-Point +Standard can be tested, except for conversions to and from decimal. +The following binary formats can be tested: <NOBR>32-bit</NOBR> +single-precision, <NOBR>64-bit</NOBR> double-precision, <NOBR>80-bit</NOBR> +double-extended-precision, and/or <NOBR>128-bit</NOBR> quadruple-precision. +TestFloat cannot test decimal floating-point. +</P> + +<P> +Included in the TestFloat package are the <CODE>testsoftfloat</CODE> and +<CODE>timesoftfloat</CODE> programs for testing the Berkeley SoftFloat software +implementation of floating-point and for measuring its speed. +Information about SoftFloat can be found at the SoftFloat Web page, +<A HREF="http://www.jhauser.us/arithmetic/SoftFloat.html"><CODE>http://www.jhauser.us/arithmetic/SoftFloat.html</CODE></A>. +The <CODE>testsoftfloat</CODE> and <CODE>timesoftfloat</CODE> programs are +expected to be of interest only to people compiling the SoftFloat sources. +</P> + +<P> +This document explains how to use the TestFloat programs. +It does not attempt to define or explain much of the IEEE Floating-Point +Standard. +Details about the standard are available elsewhere. +</P> + +<P> +The current version of TestFloat is <NOBR>Release 3</NOBR>. +The set of TestFloat programs as well as the programs' arguments and behavior +have changed some compared to earlier TestFloat releases. +</P> + + +<H2>2. Limitations</H2> + +<P> +TestFloat output is not always easily interpreted. +Detailed knowledge of the IEEE Floating-Point Standard and its vagaries is +needed to use TestFloat responsibly. +</P> + +<P> +TestFloat performs relatively simple tests designed to check the fundamental +soundness of the floating-point under test. +TestFloat may also at times manage to find rarer and more subtle bugs, but it +will probably only find such bugs by chance. +Software that purposefully seeks out various kinds of subtle floating-point +bugs can be found through links posted on the TestFloat Web page +(<A HREF="http://www.jhauser.us/arithmetic/TestFloat.html"><CODE>http://www.jhauser.us/arithmetic/TestFloat.html</CODE></A>). +</P> + + +<H2>3. Acknowledgments and License</H2> + +<P> +The TestFloat package was written by me, <NOBR>John R.</NOBR> Hauser. +<NOBR>Release 3</NOBR> of TestFloat is a completely new implementation +supplanting earlier releases. +This project was done in the employ of the University of California, Berkeley, +within the Department of Electrical Engineering and Computer Sciences, first +for the Parallel Computing Laboratory (Par Lab) and then for the ASPIRE Lab. +The work was officially overseen by Prof. Krste Asanovic, with funding provided +by these sources: +<BLOCKQUOTE> +<TABLE> +<TR> +<TD><NOBR>Par Lab:</NOBR></TD> +<TD> +Microsoft (Award #024263), Intel (Award #024894), and U.C. Discovery +(Award #DIG07-10227), with additional support from Par Lab affiliates Nokia, +NVIDIA, Oracle, and Samsung. +</TD> +</TR> +<TR> +<TD><NOBR>ASPIRE Lab:</NOBR></TD> +<TD> +DARPA PERFECT program (Award #HR0011-12-2-0016), with additional support from +ASPIRE industrial sponsor Intel and ASPIRE affiliates Google, Nokia, NVIDIA, +Oracle, and Samsung. +</TD> +</TR> +</TABLE> +</BLOCKQUOTE> +</P> + +<P> +The following applies to the whole of TestFloat <NOBR>Release 3</NOBR> as well +as to each source file individually. +</P> + +<P> +Copyright 2011, 2012, 2013, 2014 The Regents of the University of California +(Regents). +All Rights Reserved. +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: +</P> + +<P> +Redistributions of source code must retain the above copyright notice, this +list of conditions, and the following two paragraphs of disclaimer. +Redistributions in binary form must reproduce the above copyright notice, this +list of conditions, and the following two paragraphs of disclaimer in the +documentation and/or other materials provided with the distribution. +Neither the name of the Regents nor the names of its contributors may be used +to endorse or promote products derived from this software without specific +prior written permission. +</P> + +<P> +IN NO EVENT SHALL REGENTS BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, +INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF +THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF REGENTS HAS BEEN +ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +</P> + +<P> +REGENTS SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. +THE SOFTWARE AND ACCOMPANYING DOCUMENTATION, IF ANY, PROVIDED HEREUNDER IS +PROVIDED "<NOBR>AS IS</NOBR>". +REGENTS HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, +ENHANCEMENTS, OR MODIFICATIONS. +</P> + + +<H2>4. What TestFloat Does</H2> + +<P> +TestFloat is designed to test a floating-point implementation by comparing its +behavior with that of TestFloat's own internal floating-point implemented in +software. +For each operation to be tested, the TestFloat programs can generate a large +number of test cases, made up of simple pattern tests intermixed with weighted +random inputs. +The cases generated should be adequate for testing carry chain propagations, +plus the rounding of addition, subtraction, multiplication, and simple +operations like conversions. +TestFloat makes a point of checking all boundary cases of the arithmetic, +including underflows, overflows, invalid operations, subnormal inputs, zeros +(positive and negative), infinities, and NaNs. +For the interesting operations like addition and multiplication, millions of +test cases may be checked. +</P> + +<P> +TestFloat is not remarkably good at testing difficult rounding cases for +division and square root. +It also makes no attempt to find bugs specific to SRT division and the like +(such as the infamous Pentium division bug). +Software that tests for such failures can be found through links on the +TestFloat Web page, +<A HREF="http://www.jhauser.us/arithmetic/TestFloat.html"><CODE>http://www.jhauser.us/arithmetic/TestFloat.html</CODE></A>. +</P> + +<P> +NOTE!<BR> +It is the responsibility of the user to verify that the discrepancies TestFloat +finds actually represent faults in the implementation being tested. +Advice to help with this task is provided later in this document. +Furthermore, even if TestFloat finds no fault with a floating-point +implementation, that in no way guarantees that the implementation is bug-free. +</P> + +<P> +For each operation, TestFloat can test all five rounding modes defined by the +IEEE Floating-Point Standard. +TestFloat verifies not only that the numeric results of an operation are +correct, but also that the proper floating-point exception flags are raised. +All five exception flags are tested, including the <I>inexact</I> flag. +TestFloat does not attempt to verify that the floating-point exception flags +are actually implemented as sticky flags. +</P> + +<P> +For the <NOBR>80-bit</NOBR> double-extended-precision format, TestFloat can +test the addition, subtraction, multiplication, division, and square root +operations at all three of the standard rounding precisions. +The rounding precision can be set to <NOBR>32 bits</NOBR>, equivalent to +single-precision, to <NOBR>64 bits</NOBR>, equivalent to double-precision, or +to the full <NOBR>80 bits</NOBR> of the double-extended-precision. +Rounding precision control can be applied only to the double-extended-precision +format and only for the five basic arithmetic operations: addition, +subtraction, multiplication, division, and square root. +Other operations can be tested only at full precision. +</P> + +<P> +As a rule, TestFloat is not particular about the bit patterns of NaNs that +appear as operation results. +Any NaN is considered as good a result as another. +This laxness can be overridden so that TestFloat checks for particular bit +patterns within NaN results. +See <NOBR>section 8</NOBR> below, <I>Variations Allowed by the IEEE +Floating-Point Standard</I>, plus the <CODE>-checkNaNs</CODE> option documented +for programs <CODE>testfloat_ver</CODE> and <CODE>testfloat</CODE>. +</P> + +<P> +TestFloat normally compares an implementation of floating-point against the +Berkeley SoftFloat software implementation of floating-point, also created by +me. +The SoftFloat functions are linked into each TestFloat program's executable. +Information about SoftFloat can be found at the Web page +<A HREF="http://www.jhauser.us/arithmetic/SoftFloat.html"><CODE>http://www.jhauser.us/arithmetic/SoftFloat.html</CODE></A>. +</P> + +<P> +For testing SoftFloat itself, the TestFloat package includes a +<CODE>testsoftfloat</CODE> program that compares SoftFloat's floating-point +against <EM>another</EM> software floating-point implementation. +The second software floating-point is simpler and slower than SoftFloat, and is +completely independent of SoftFloat. +Although the second software floating-point cannot be guaranteed to be +bug-free, the chance that it would mimic any of SoftFloat's bugs is low. +Consequently, an error in one or the other floating-point version should appear +as an unexpected difference between the two implementations. +Note that testing SoftFloat should be necessary only when compiling a new +TestFloat executable or when compiling SoftFloat for some other reason. +</P> + + +<H2>5. Executing TestFloat</H2> + +<P> +The TestFloat package consists of five programs, all intended to be executed +from a command-line interpreter: +<BLOCKQUOTE> +<TABLE> +<TR> +<TD> +<A HREF="testfloat_gen.html"><CODE>testfloat_gen</CODE></A><CODE> </CODE> +</TD> +<TD> +Generates test cases for a specific floating-point operation. +</TD> +</TR> +<TR> +<TD><A HREF="testfloat_ver.html"><CODE>testfloat_ver</CODE></A></TD> +<TD> +Verifies whether the results from executing a floating-point operation are as +expected. +</TD> +</TR> +<TR> +<TD><A HREF="testfloat.html"><CODE>testfloat</CODE></A></TD> +<TD> +An all-in-one program that generates test cases, executes floating-point +operations, and verifies whether the results match expectations. +</TD> +</TR> +<TR> +<TD> +<A HREF="testsoftfloat.html"><CODE>testsoftfloat</CODE></A><CODE> </CODE> +</TD> +<TD> +Like <CODE>testfloat</CODE>, but for testing SoftFloat. +</TD> +</TR> +<TR> +<TD> +<A HREF="timesoftfloat.html"><CODE>timesoftfloat</CODE></A><CODE> </CODE> +</TD> +<TD> +A program for measuring the speed of SoftFloat (included in the TestFloat +package for convenience). +</TD> +</TR> +</TABLE> +</BLOCKQUOTE> +Each program has its own page of documentation that can be opened through the +links in the table above. +</P> + +<P> +To test a floating-point implementation other than SoftFloat, one of three +different methods can be used. +The first method pipes output from <CODE>testfloat_gen</CODE> to a program +that: +<NOBR>(a) reads</NOBR> the incoming test cases, <NOBR>(b) invokes</NOBR> the +floating-point operation being tested, and <NOBR>(c) writes</NOBR> the +operation results to output. +These results can then be piped to <CODE>testfloat_ver</CODE> to be checked for +correctness. +Assuming a vertical bar (<CODE>|</CODE>) indicates a pipe between programs, the +complete process could be written as a single command like so: +<PRE> + testfloat_gen ... <type> | <program-that-invokes-op> | testfloat_ver ... <function> +</PRE> +The program in the middle is not supplied by TestFloat but must be created +independently. +If for some reason this program cannot take command-line arguments, the +<CODE>-prefix</CODE> option of <CODE>testfloat_gen</CODE> can communicate +parameters through the pipe. +</P> + +<P> +A second method for running TestFloat is similar but has +<CODE>testfloat_gen</CODE> supply not only the test inputs but also the +expected results for each case. +With this additional information, the job done by <CODE>testfloat_ver</CODE> +can be folded into the invoking program to give the following command: +<PRE> + testfloat_gen ... <function> | <program-that-invokes-op-and-compares-results> +</PRE> +Again, the program that actually invokes the floating-point operation is not +supplied by TestFloat but must be created independently. +Depending on circumstance, it may be preferable either to let +<CODE>testfloat_ver</CODE> check and report suspected errors (first method) or +to include this step in the invoking program (second method). +</P> + +<P> +The third way to use TestFloat is the all-in-one <CODE>testfloat</CODE> +program. +This program can perform all the steps of creating test cases, invoking the +floating-point operation, checking the results, and reporting suspected errors. +However, for this to be possible, <CODE>testfloat</CODE> must be compiled to +contain the method for invoking the floating-point operations to test. +Each build of <CODE>testfloat</CODE> is therefore capable of testing +<EM>only</EM> the floating-point implementation it was built to invoke. +To test a new implementation of floating-point, a new <CODE>testfloat</CODE> +must be created, linked to that specific implementation. +By comparison, the <CODE>testfloat_gen</CODE> and <CODE>testfloat_ver</CODE> +programs are entirely generic; +one instance is usable for testing any floating-point implementation, because +implementation-specific details are segregated in the custom program that +follows <CODE>testfloat_gen</CODE>. +</P> + +<P> +Program <CODE>testsoftfloat</CODE> is another all-in-one program specifically +for testing SoftFloat. +</P> + +<P> +Programs <CODE>testfloat_ver</CODE>, <CODE>testfloat</CODE>, and +<CODE>testsoftfloat</CODE> all report status and error information in a common +way. +As it executes, each of these programs writes status information to the +standard error output, which should be the screen by default. +In order for this status to be displayed properly, the standard error stream +should not be redirected to a file. +Any discrepancies that are found are written to the standard output stream, +which is easily redirected to a file if desired. +Unless redirected, reported errors will appear intermixed with the ongoing +status information in the output. +</P> + + +<H2>6. Operations Tested by TestFloat</H2> + +<P> +TestFloat can test all operations required by the original 1985 IEEE +Floating-Point Standard except for conversions to and from decimal. +These operations are: +<UL> +<LI> +conversions among the supported floating-point formats, and also between +integers (<NOBR>32-bit</NOBR> and <NOBR>64-bit</NOBR>, signed and unsigned) and +any of the floating-point formats; +<LI> +for each floating-point format, the usual addition, subtraction, +multiplication, division, and square root operations; +<LI> +for each format, the floating-point remainder operation defined by the IEEE +Standard; +<LI> +for each format, a ``round to integer'' operation that rounds to the nearest +integer value in the same format; and +<LI> +comparisons between two values in the same floating-point format. +</UL> +In addition, TestFloat can also test +<UL> +<LI> +for each floating-point format except <NOBR>80-bit</NOBR> +double-extended-precision, the fused multiply-add operation defined by the 2008 +IEEE Standard. +</UL> +</P> + +<P> +More information about all these operations is given below. +In the operation names used by TestFloat, <NOBR>32-bit</NOBR> single-precision +is called <CODE>f32</CODE>, <NOBR>64-bit</NOBR> double-precision is +<CODE>f64</CODE>, <NOBR>80-bit</NOBR> double-extended-precision is +<CODE>extF80</CODE>, and <NOBR>128-bit</NOBR> quadruple-precision is +<CODE>f128</CODE>. +TestFloat generally uses the same names for operations as Berkeley SoftFloat, +except that TestFloat's names never include the <CODE>M</CODE> that SoftFloat +uses to indicate that values are passed through pointers. +</P> + +<H3>6.1. Conversion Operations</H3> + +<P> +All conversions among the floating-point formats and all conversions between a +floating-point format and <NOBR>32-bit</NOBR> and <NOBR>64-bit</NOBR> integers +can be tested. +The conversion operations are: +<PRE> + ui32_to_f32 ui64_to_f32 i32_to_f32 i64_to_f32 + ui32_to_f64 ui64_to_f64 i32_to_f64 i64_to_f64 + ui32_to_extF80 ui64_to_extF80 i32_to_extF80 i64_to_extF80 + ui32_to_f128 ui64_to_f128 i32_to_f128 i64_to_f128 + + f32_to_ui32 f64_to_ui32 extF80_to_ui32 f128_to_ui32 + f32_to_ui64 f64_to_ui64 extF80_to_ui64 f128_to_ui64 + f32_to_i32 f64_to_i32 extF80_to_i32 f128_to_i32 + f32_to_i64 f64_to_i64 extF80_to_i64 f128_to_i64 + + f32_to_f64 f64_to_f32 extF80_to_f32 f128_to_f32 + f32_to_extF80 f64_to_extF80 extF80_to_f64 f128_to_f64 + f32_to_f128 f64_to_f128 extF80_to_f128 f128_to_extF80 +</PRE> +Abbreviations <CODE>ui32</CODE> and <CODE>ui64</CODE> indicate +<NOBR>32-bit</NOBR> and <NOBR>64-bit</NOBR> unsigned integer types, while +<CODE>i32</CODE> and <CODE>i64</CODE> indicate their signed counterparts. +These conversions all round according to the current rounding mode as relevant. +Conversions from a smaller to a larger floating-point format are always exact +and so require no rounding. +Likewise, conversions from <NOBR>32-bit</NOBR> integers to <NOBR>64-bit</NOBR> +double-precision or to any larger floating-point format are also exact, as are +conversions from <NOBR>64-bit</NOBR> integers to <NOBR>80-bit</NOBR> +double-extended-precision and <NOBR>128-bit</NOBR> quadruple-precision. +</P> + +<P> +For the all-in-one <CODE>testfloat</CODE> program, this list of conversion +operations requires amendment. +For <CODE>testfloat</CODE> only, conversions to an integer type have names that +explicitly specify the rounding mode and treatment of inexactness. +Thus, instead of +<PRE> + <float>_to_<int> +</PRE> +as listed above, operations converting to integer type have names of these +forms: +<PRE> + <float>_to_<int>_r_<round> + <float>_to_<int>_rx_<round> +</PRE> +The <CODE><round></CODE> component is one of `<CODE>near_even</CODE>', +`<CODE>near_maxMag</CODE>', `<CODE>minMag</CODE>', `<CODE>min</CODE>', or +`<CODE>max</CODE>', choosing the rounding mode. +Any other indication of rounding mode is ignored. +The operations with `<CODE>_r_</CODE>' in their names never raise the +<I>inexact</I> exception, while those with `<CODE>_rx_</CODE>' raise the +<I>inexact</I> exception whenever the result is not exact. +</P> + +<P> +TestFloat assumes that conversions from floating-point to an integer type +should raise the <I>invalid</I> exception if the input cannot be rounded to an +integer representable by the result format. +In such a circumstance, if the result type is an unsigned integer, TestFloat +expects the result of the operation to be the type's largest integer value. +If the result type is a signed integer and conversion overflows, TestFloat +expects the result to be the largest-magnitude integer with the same sign as +the input. +Lastly, when a NaN is converted to a signed integer type, TestFloat allows +either the largest postive or largest-magnitude negative integer to be +returned. +Conversions to integer types are expected never to raise the <I>overflow</I> +exception. +</P> + +<H3>6.2. Basic Arithmetic Operations</H3> + +<P> +The following standard arithmetic operations can be tested: +<PRE> + f32_add f32_sub f32_mul f32_div f32_sqrt + f64_add f64_sub f64_mul f64_div f64_sqrt + extF80_add extF80_sub extF80_mul extF80_div extF80_sqrt + f128_add f128_sub f128_mul f128_div f128_sqrt +</PRE> +The double-extended-precision (<CODE>extF80</CODE>) operations can be rounded +to reduced precision under rounding precision control. +</P> + +<H3>6.3. Fused Multiply-Add Operations</H3> + +<P> +For all floating-point formats except <NOBR>80-bit</NOBR> +double-extended-precision, TestFloat can test the fused multiply-add operation +defined by the 2008 IEEE Floating-Point Standard. +The fused multiply-add operations are: +<PRE> + f32_mulAdd + f64_mulAdd + f128_mulAdd +</PRE> +</P> + +<P> +If one of the multiplication operands is infinite and the other is zero, +TestFloat expects the fused multiply-add operation to raise the <I>invalid</I> +exception even if the third operand is a NaN. +</P> + +<H3>6.4. Remainder Operations</H3> + +<P> +For each format, TestFloat can test the IEEE Standard's remainder operation. +These operations are: +<PRE> + f32_rem + f64_rem + extF80_rem + f128_rem +</PRE> +The remainder operations are always exact and so require no rounding. +</P> + +<H3>6.5. Round-to-Integer Operations</H3> + +<P> +For each format, TestFloat can test the IEEE Standard's round-to-integer +operation. +For most TestFloat programs, these operations are: +<PRE> + f32_roundToInt + f64_roundToInt + extF80_roundToInt + f128_roundToInt +</PRE> +</P> + +<P> +Just as for conversions to integer types (<NOBR>section 6.1</NOBR> above), the +all-in-one <CODE>testfloat</CODE> program is again an exception. +For <CODE>testfloat</CODE> only, the round-to-integer operations have names of +these forms: +<PRE> + <float>_roundToInt_r_<round> + <float>_roundToInt_x +</PRE> +For the `<CODE>_r_</CODE>' versions, the <I>inexact</I> exception is never +raised, and the <CODE><round></CODE> component specifies the rounding +mode as one of `<CODE>near_even</CODE>', `<CODE>near_maxMag</CODE>', +`<CODE>minMag</CODE>', `<CODE>min</CODE>', or `<CODE>max</CODE>'. +The usual indication of rounding mode is ignored. +In contrast, the `<CODE>_x</CODE>' versions accept the usual indication of +rounding mode and raise the <I>inexact</I> exception whenever the result is not +exact. +This irregular system follows the IEEE Standard's precise specification for the +round-to-integer operations. +</P> + +<H3>6.6. Comparison Operations</H3> + +<P> +The following floating-point comparison operations can be tested: +<PRE> + f32_eq f32_le f32_lt + f64_eq f64_le f64_lt + extF80_eq extF80_le extF80_lt + f128_eq f128_le f128_lt +</PRE> +The abbreviation <CODE>eq</CODE> stands for ``equal'' (=), <CODE>le</CODE> +stands for ``less than or equal'' (≤), and <CODE>lt</CODE> stands for +``less than'' (<). +</P> + +<P> +The IEEE Standard specifies that, by default, the less-than-or-equal and +less-than comparisons raise the <I>invalid</I> exception if either input is any +kind of NaN. +The equality comparisons, on the other hand, are defined by default to raise +the <I>invalid</I> exception only for signaling NaNs, not for quiet NaNs. +For completeness, the following additional operations can be tested if +supported: +<PRE> + f32_eq_signaling f32_le_quiet f32_lt_quiet + f64_eq_signaling f64_le_quiet f64_lt_quiet + extF80_eq_signaling extF80_le_quiet extF80_lt_quiet + f128_eq_signaling f128_le_quiet f128_lt_quiet +</PRE> +The <CODE>signaling</CODE> equality comparisons are identical to the standard +operations except that the <I>invalid</I> exception should be raised for any +NaN input. +Similarly, the <CODE>quiet</CODE> comparison operations should be identical to +their counterparts except that the <I>invalid</I> exception is not raised for +quiet NaNs. +</P> + +<P> +Obviously, no comparison operations ever require rounding. +Any rounding mode is ignored. +</P> + + +<H2>7. Interpreting TestFloat Output</H2> + +<P> +The ``errors'' reported by TestFloat programs may or may not really represent +errors in the system being tested. +For each test case tried, the results from the floating-point implementation +being tested could differ from the expected results for several reasons: +<UL> +<LI> +The IEEE Floating-Point Standard allows for some variation in how conforming +floating-point behaves. +Two implementations can sometimes give different results without either being +incorrect. +<LI> +The trusted floating-point emulation could be faulty. +This could be because there is a bug in the way the enulation is coded, or +because a mistake was made when the code was compiled for the current system. +<LI> +The TestFloat program may not work properly, reporting differences that do not +exist. +<LI> +Lastly, the floating-point being tested could actually be faulty. +</UL> +It is the responsibility of the user to determine the causes for the +discrepancies that are reported. +Making this determination can require detailed knowledge about the IEEE +Standard. +Assuming TestFloat is working properly, any differences found will be due to +either the first or last of the reasons above. +Variations in the IEEE Standard that could lead to false error reports are +discussed in <NOBR>section 8</NOBR>, <I>Variations Allowed by the IEEE +Floating-Point Standard</I>. +</P> + +<P> +For each reported error (or apparent error), a line of text is written to the +default output. +If a line would be longer than 79 characters, it is divided. +The first part of each error line begins in the leftmost column, and any +subsequent ``continuation'' lines are indented with a tab. +</P> + +<P> +Each error reported is of the form: +<PRE> + <inputs> => <observed-output> expected: <expected-output> +</PRE> +The <CODE><inputs></CODE> are the inputs to the operation. +Each output (observed and expected) is shown as a pair: the result value +first, followed by the exception flags. +</P> + +<P> +For example, two typical error lines could be +<PRE> + 800.7FFF00 87F.000100 => 001.000000 ...ux expected: 001.000000 ....x + 081.000004 000.1FFFFF => 001.000000 ...ux expected: 001.000000 ....x +</PRE> +In the first line, the inputs are <CODE>800.7FFF00</CODE> and +<CODE>87F.000100</CODE>, and the observed result is <CODE>001.000000</CODE> +with flags <CODE>...ux</CODE>. +The trusted emulation result is the same but with different flags, +<CODE>....x</CODE>. +Items such as <CODE>800.7FFF00</CODE> composed of hexadecimal digits and a +single period represent floating-point values (here <NOBR>32-bit</NOBR> +single-precision). +The two instances above were reported as errors because the exception flag +results differ. +</P> + +<P> +Aside from the exception flags, there are nine data types that may be +represented. +Four are floating-point types: <NOBR>32-bit</NOBR> single-precision, +<NOBR>64-bit</NOBR> double-precision, <NOBR>80-bit</NOBR> +double-extended-precision, and <NOBR>128-bit</NOBR> quadruple-precision. +The remaining five types are <NOBR>32-bit</NOBR> and <NOBR>64-bit</NOBR> +unsigned integers, <NOBR>32-bit</NOBR> and <NOBR>64-bit</NOBR> two's-complement +signed integers, and Boolean values (the results of comparison operations). +Boolean values are represented as a single character, either a <CODE>0</CODE> +or a <CODE>1</CODE>. +<NOBR>32-bit</NOBR> integers are represented as 8 hexadecimal digits. +Thus, for a signed <NOBR>32-bit</NOBR> integer, <CODE>FFFFFFFF</CODE> is -1, +and <CODE>7FFFFFFF</CODE> is the largest positive value. +<NOBR>64-bit</NOBR> integers are the same except with 16 hexadecimal digits. +</P> + +<P> +Floating-point values are written in a correspondingly primitive form. +Values of the <NOBR>64-bit</NOBR> double-precision format are represented by 16 +hexadecimal digits that give the raw bits of the floating-point encoding. +A period separates the 3rd and 4th hexadecimal digits to mark the division +between the exponent bits and fraction bits. +Some notable <NOBR>64-bit</NOBR> double-precision values include: +<PRE> + 000.0000000000000 +0 + 3FF.0000000000000 1 + 400.0000000000000 2 + 7FF.0000000000000 +infinity + + 800.0000000000000 -0 + BFF.0000000000000 -1 + C00.0000000000000 -2 + FFF.0000000000000 -infinity + + 3FE.FFFFFFFFFFFFF largest representable number less than +1 +</PRE> +The following categories are easily distinguished (assuming the +<CODE>x</CODE>s are not all 0): +<PRE> + 000.xxxxxxxxxxxxx positive subnormal (denormalized) numbers + 7FF.xxxxxxxxxxxxx positive NaNs + 800.xxxxxxxxxxxxx negative subnormal numbers + FFF.xxxxxxxxxxxxx negative NaNs +</PRE> +</P> + +<P> +<NOBR>128-bit</NOBR> quadruple-precision values are written the same except +with 4 hexadecimal digits for the sign and exponent and 28 for the fraction. +Notable values include: +<PRE> + 0000.0000000000000000000000000000 +0 + 3FFF.0000000000000000000000000000 1 + 4000.0000000000000000000000000000 2 + 7FFF.0000000000000000000000000000 +infinity + + 8000.0000000000000000000000000000 -0 + BFFF.0000000000000000000000000000 -1 + C000.0000000000000000000000000000 -2 + FFFF.0000000000000000000000000000 -infinity + + 3FFE.FFFFFFFFFFFFFFFFFFFFFFFFFFFF largest representable number + less than +1 +</PRE> +</P> + +<P> +<NOBR>80-bit</NOBR> double-extended-precision values are a little unusual in +that the leading bit of precision is not hidden as with other formats. +When correctly encoded, the leading significand bit of an <NOBR>80-bit</NOBR> +double-extended-precision value will be 0 if the value is zero or subnormal, +and will be 1 otherwise. +Hence, the same values listed above appear in <NOBR>80-bit</NOBR> +double-extended-precision as follows (note the leading <CODE>8</CODE> digit in +the significands): +<PRE> + 0000.0000000000000000 +0 + 3FFF.8000000000000000 1 + 4000.8000000000000000 2 + 7FFF.8000000000000000 +infinity + + 8000.0000000000000000 -0 + BFFF.8000000000000000 -1 + C000.8000000000000000 -2 + FFFF.8000000000000000 -infinity + + 3FFE.FFFFFFFFFFFFFFFF largest representable number less than +1 +</PRE> +</P> + +<P> +The representation of <NOBR>32-bit</NOBR> single-precision values is unusual +for a different reason. +Because the subfields of standard <NOBR>32-bit</NOBR> single-precision do not +fall on neat <NOBR>4-bit</NOBR> boundaries, single-precision outputs are +slightly perturbed. +These are written as 9 hexadecimal digits, with a period separating the 3rd and +4th hexadecimal digits. +Broken out into bits, the 9 hexademical digits cover the <NOBR>32-bit</NOBR> +single-precision subfields as follows: +<PRE> + x000 .... .... . .... .... .... .... .... .... sign (1 bit) + .... xxxx xxxx . .... .... .... .... .... .... exponent (8 bits) + .... .... .... . 0xxx xxxx xxxx xxxx xxxx xxxx fraction (23 bits) +</PRE> +As shown in this schematic, the first hexadecimal digit contains only the sign, +and will be either <CODE>0</CODE> <NOBR>or <CODE>8</CODE></NOBR>. +The next two digits give the biased exponent as an <NOBR>8-bit</NOBR> integer. +This is followed by a period and 6 hexadecimal digits of fraction. +The most significant hexadecimal digit of the fraction can be at most +<NOBR>a <CODE>7</CODE></NOBR>. +</P> + +<P> +Notable single-precision values include: +<PRE> + 000.000000 +0 + 07F.000000 1 + 080.000000 2 + 0FF.000000 +infinity + + 800.000000 -0 + 87F.000000 -1 + 880.000000 -2 + 8FF.000000 -infinity + + 07E.7FFFFF largest representable number less than +1 +</PRE> +Again, certain categories are easily distinguished (assuming the +<CODE>x</CODE>s are not all 0): +<PRE> + 000.xxxxxx positive subnormal (denormalized) numbers + 0FF.xxxxxx positive NaNs + 800.xxxxxx negative subnormal numbers + 8FF.xxxxxx negative NaNs +</PRE> +</P> + +<P> +Lastly, exception flag values are represented by five characters, one character +per flag. +Each flag is written as either a letter or a period (<CODE>.</CODE>) according +to whether the flag was set or not by the operation. +A period indicates the flag was not set. +The letter used to indicate a set flag depends on the flag: +<PRE> + v invalid exception + i infinite exception ("divide by zero") + o overflow exception + u underflow exception + x inexact exception +</PRE> +For example, the notation <CODE>...ux</CODE> indicates that the +<I>underflow</I> and <I>inexact</I> exception flags were set and that the other +three flags (<I>invalid</I>, <I>infinite</I>, and <I>overflow</I>) were not +set. +The exception flags are always written following the value returned as the +result of the operation. +</P> + + +<H2>8. Variations Allowed by the IEEE Floating-Point Standard</H2> + +<P> +The IEEE Floating-Point Standard admits some variation among conforming +implementations. +Because TestFloat expects the two implementations being compared to deliver +bit-for-bit identical results under most circumstances, this leeway in the +standard can result in false errors being reported if the two implementations +do not make the same choices everywhere the standard provides an option. +</P> + +<H3>8.1. Underflow</H3> + +<P> +The standard specifies that the <I>underflow</I> exception flag is to be raised +when two conditions are met simultaneously: +<NOBR>(1) <I>tininess</I></NOBR> and <NOBR>(2) <I>loss of accuracy</I></NOBR>. +</P> + +<P> +A result is tiny when its magnitude is nonzero yet smaller than any normalized +floating-point number. +The standard allows tininess to be determined either before or after a result +is rounded to the destination precision. +If tininess is detected before rounding, some borderline cases will be flagged +as underflows even though the result after rounding actually lies within the +normal floating-point range. +By detecting tininess after rounding, a system can avoid some unnecessary +signaling of underflow. +All the TestFloat programs support options <CODE>-tininessbefore</CODE> and +<CODE>-tininessafter</CODE> to control whether TestFloat expects tininess on +underflow to be detected before or after rounding. +One or the other is selected as the default when TestFloat is compiled, but +these command options allow the default to be overridden. +</P> + +<P> +Loss of accuracy occurs when the subnormal format is not sufficient to +represent an underflowed result accurately. +The original 1985 version of the IEEE Standard allowed loss of accuracy to be +detected either as an <I>inexact result</I> or as a +<I>denormalization loss</I>; +however, few if any systems ever chose the latter. +The latest standard requires that loss of accuracy be detected as an inexact +result, and TestFloat can test only for this case. +</P> + +<H3>8.2. NaNs</H3> + +<P> +The IEEE Standard gives the floating-point formats a large number of NaN +encodings and specifies that NaNs are to be returned as results under certain +conditions. +However, the standard allows an implementation almost complete freedom over +<EM>which</EM> NaN to return in each situation. +</P> + +<P> +By default, TestFloat does not check the bit patterns of NaN results. +When the result of an operation should be a NaN, any NaN is considered as good +as another. +This laxness can be overridden with the <CODE>-checkNaNs</CODE> option of +programs <CODE>testfloat_ver</CODE> and <CODE>testfloat</CODE>. +In order for this option to be sensible, TestFloat must have been compiled so +that its internal floating-point implementation (SoftFloat) generates the +proper NaN results for the system being tested. +</P> + +<H3>8.3. Conversions to Integer</H3> + +<P> +Conversion of a floating-point value to an integer format will fail if the +source value is a NaN or if it is too large. +The IEEE Standard does not specify what value should be returned as the integer +result in these cases. +Moreover, according to the standard, the <I>invalid</I> exception can be raised +or an unspecified alternative mechanism may be used to signal such cases. +</P> + +<P> +TestFloat assumes that conversions to integer will raise the <I>invalid</I> +exception if the source value cannot be rounded to a representable integer. +In such cases, TestFloat expects the result value to be the largest-magnitude +positive or negative integer as detailed earlier in <NOBR>section 6.1</NOBR>, +<I>Conversion Operations</I>. +The current version of TestFloat provides no means to alter these expectations. +</P> + + +<H2>9. Contact Information</H2> + +<P> +At the time of this writing, the most up-to-date information about TestFloat +and the latest release can be found at the Web page +<A HREF="http://www.jhauser.us/arithmetic/TestFloat.html"><CODE>http://www.jhauser.us/arithmetic/TestFloat.html</CODE></A>. +</P> + + +</BODY> + |