diff options
Diffstat (limited to 'manual/=float.texinfo')
-rw-r--r-- | manual/=float.texinfo | 72 |
1 files changed, 35 insertions, 37 deletions
diff --git a/manual/=float.texinfo b/manual/=float.texinfo index a8c9015..d4e3920 100644 --- a/manual/=float.texinfo +++ b/manual/=float.texinfo @@ -1,4 +1,4 @@ -@node Floating-Point Limits +@node Floating-Point Limits @chapter Floating-Point Limits @pindex <float.h> @cindex floating-point number representation @@ -75,7 +75,7 @@ unsigned quantity. @cindex mantissa (of floating-point number) @cindex significand (of floating-point number) -@item +@item The @dfn{precision} of the mantissa. If the base of the representation is @var{b}, then the precision is the number of base-@var{b} digits in the mantissa. This is a constant for the particular representation. @@ -124,14 +124,14 @@ expression, so the other macros listed here cannot be reliably used in places that require constant expressions, such as @samp{#if} preprocessing directives and array size specifications. -Although the ANSI C standard specifies minimum and maximum values for +Although the @w{ISO C} standard specifies minimum and maximum values for most of these parameters, the GNU C implementation uses whatever floating-point representations are supported by the underlying hardware. -So whether GNU C actually satisfies the ANSI C requirements depends on +So whether GNU C actually satisfies the @w{ISO C} requirements depends on what machine it is running on. @comment float.h -@comment ANSI +@comment ISO @defvr Macro FLT_ROUNDS This value characterizes the rounding mode for floating-point addition. The following values indicate standard rounding modes: @@ -155,7 +155,7 @@ mode. @end defvr @comment float.h -@comment ANSI +@comment ISO @defvr Macro FLT_RADIX This is the value of the base, or radix, of exponent representation. This is guaranteed to be a constant expression, unlike the other macros @@ -163,28 +163,28 @@ described in this section. @end defvr @comment float.h -@comment ANSI +@comment ISO @defvr Macro FLT_MANT_DIG This is the number of base-@code{FLT_RADIX} digits in the floating-point mantissa for the @code{float} data type. @end defvr @comment float.h -@comment ANSI +@comment ISO @defvr Macro DBL_MANT_DIG This is the number of base-@code{FLT_RADIX} digits in the floating-point mantissa for the @code{double} data type. @end defvr @comment float.h -@comment ANSI +@comment ISO @defvr Macro LDBL_MANT_DIG This is the number of base-@code{FLT_RADIX} digits in the floating-point mantissa for the @code{long double} data type. @end defvr @comment float.h -@comment ANSI +@comment ISO @defvr Macro FLT_DIG This is the number of decimal digits of precision for the @code{float} data type. Technically, if @var{p} and @var{b} are the precision and @@ -198,14 +198,14 @@ The value of this macro is guaranteed to be at least @code{6}. @end defvr @comment float.h -@comment ANSI +@comment ISO @defvr Macro DBL_DIG This is similar to @code{FLT_DIG}, but is for the @code{double} data type. The value of this macro is guaranteed to be at least @code{10}. @end defvr @comment float.h -@comment ANSI +@comment ISO @defvr Macro LDBL_DIG This is similar to @code{FLT_DIG}, but is for the @code{long double} data type. The value of this macro is guaranteed to be at least @@ -213,7 +213,7 @@ data type. The value of this macro is guaranteed to be at least @end defvr @comment float.h -@comment ANSI +@comment ISO @defvr Macro FLT_MIN_EXP This is the minimum negative integer such that the mathematical value @code{FLT_RADIX} raised to this power minus 1 can be represented as a @@ -223,21 +223,21 @@ represented in the exponent field of the number. @end defvr @comment float.h -@comment ANSI +@comment ISO @defvr Macro DBL_MIN_EXP This is similar to @code{FLT_MIN_EXP}, but is for the @code{double} data type. @end defvr @comment float.h -@comment ANSI +@comment ISO @defvr Macro LDBL_MIN_EXP This is similar to @code{FLT_MIN_EXP}, but is for the @code{long double} data type. @end defvr @comment float.h -@comment ANSI +@comment ISO @defvr Macro FLT_MIN_10_EXP This is the minimum negative integer such that the mathematical value @code{10} raised to this power minus 1 can be represented as a @@ -246,14 +246,14 @@ guaranteed to be no greater than @code{-37}. @end defvr @comment float.h -@comment ANSI +@comment ISO @defvr Macro DBL_MIN_10_EXP This is similar to @code{FLT_MIN_10_EXP}, but is for the @code{double} data type. @end defvr @comment float.h -@comment ANSI +@comment ISO @defvr Macro LDBL_MIN_10_EXP This is similar to @code{FLT_MIN_10_EXP}, but is for the @code{long double} data type. @@ -262,7 +262,7 @@ double} data type. @comment float.h -@comment ANSI +@comment ISO @defvr Macro FLT_MAX_EXP This is the maximum negative integer such that the mathematical value @code{FLT_RADIX} raised to this power minus 1 can be represented as a @@ -272,21 +272,21 @@ in the exponent field of the number. @end defvr @comment float.h -@comment ANSI +@comment ISO @defvr Macro DBL_MAX_EXP This is similar to @code{FLT_MAX_EXP}, but is for the @code{double} data type. @end defvr @comment float.h -@comment ANSI +@comment ISO @defvr Macro LDBL_MAX_EXP This is similar to @code{FLT_MAX_EXP}, but is for the @code{long double} data type. @end defvr @comment float.h -@comment ANSI +@comment ISO @defvr Macro FLT_MAX_10_EXP This is the maximum negative integer such that the mathematical value @code{10} raised to this power minus 1 can be represented as a @@ -295,14 +295,14 @@ guaranteed to be at least @code{37}. @end defvr @comment float.h -@comment ANSI +@comment ISO @defvr Macro DBL_MAX_10_EXP This is similar to @code{FLT_MAX_10_EXP}, but is for the @code{double} data type. @end defvr @comment float.h -@comment ANSI +@comment ISO @defvr Macro LDBL_MAX_10_EXP This is similar to @code{FLT_MAX_10_EXP}, but is for the @code{long double} data type. @@ -310,7 +310,7 @@ double} data type. @comment float.h -@comment ANSI +@comment ISO @defvr Macro FLT_MAX The value of this macro is the maximum representable floating-point number of type @code{float}, and is guaranteed to be at least @@ -318,7 +318,7 @@ number of type @code{float}, and is guaranteed to be at least @end defvr @comment float.h -@comment ANSI +@comment ISO @defvr Macro DBL_MAX The value of this macro is the maximum representable floating-point number of type @code{double}, and is guaranteed to be at least @@ -326,7 +326,7 @@ number of type @code{double}, and is guaranteed to be at least @end defvr @comment float.h -@comment ANSI +@comment ISO @defvr Macro LDBL_MAX The value of this macro is the maximum representable floating-point number of type @code{long double}, and is guaranteed to be at least @@ -335,7 +335,7 @@ number of type @code{long double}, and is guaranteed to be at least @comment float.h -@comment ANSI +@comment ISO @defvr Macro FLT_MIN The value of this macro is the minimum normalized positive floating-point number that is representable by type @code{float}, and is @@ -343,7 +343,7 @@ guaranteed to be no more than @code{1E-37}. @end defvr @comment float.h -@comment ANSI +@comment ISO @defvr Macro DBL_MIN The value of this macro is the minimum normalized positive floating-point number that is representable by type @code{double}, and @@ -351,7 +351,7 @@ is guaranteed to be no more than @code{1E-37}. @end defvr @comment float.h -@comment ANSI +@comment ISO @defvr Macro LDBL_MIN The value of this macro is the minimum normalized positive floating-point number that is representable by type @code{long double}, @@ -360,7 +360,7 @@ and is guaranteed to be no more than @code{1E-37}. @comment float.h -@comment ANSI +@comment ISO @defvr Macro FLT_EPSILON This is the minimum positive floating-point number of type @code{float} such that @code{1.0 + FLT_EPSILON != 1.0} is true. It's guaranteed to @@ -368,14 +368,14 @@ be no greater than @code{1E-5}. @end defvr @comment float.h -@comment ANSI +@comment ISO @defvr Macro DBL_EPSILON This is similar to @code{FLT_EPSILON}, but is for the @code{double} type. The maximum value is @code{1E-9}. @end defvr @comment float.h -@comment ANSI +@comment ISO @defvr Macro LDBL_EPSILON This is similar to @code{FLT_EPSILON}, but is for the @code{long double} type. The maximum value is @code{1E-9}. @@ -388,7 +388,8 @@ type. The maximum value is @code{1E-9}. Here is an example showing how these parameters work for a common floating point representation, specified by the @cite{IEEE Standard for -Binary Floating-Point Arithmetic (ANSI/IEEE Std 754-1985)}. +Binary Floating-Point Arithmetic (ANSI/IEEE Std 754-1985 or ANSI/IEEE +Std 854-1987)}. The IEEE single-precision float representation uses a base of 2. There is a sign bit, a mantissa with 23 bits plus one hidden bit (so the total @@ -411,6 +412,3 @@ FLT_MIN 1.17549435E-38F FLT_MAX 3.40282347E+38F FLT_EPSILON 1.19209290E-07F @end example - - - |