diff options
Diffstat (limited to 'manual/=limits.texinfo')
-rw-r--r-- | manual/=limits.texinfo | 593 |
1 files changed, 593 insertions, 0 deletions
diff --git a/manual/=limits.texinfo b/manual/=limits.texinfo new file mode 100644 index 0000000..3e384dd6b --- /dev/null +++ b/manual/=limits.texinfo @@ -0,0 +1,593 @@ +@node Representation Limits, System Configuration Limits, System Information, Top +@chapter Representation Limits + +This chapter contains information about constants and parameters that +characterize the representation of the various integer and +floating-point types supported by the GNU C library. + +@menu +* Integer Representation Limits:: Determining maximum and minimum + representation values of + various integer subtypes. +* Floating-Point Limits :: Parameters which characterize + supported floating-point + representations on a particular + system. +@end menu + +@node Integer Representation Limits, Floating-Point Limits , , Representation Limits +@section Integer Representation Limits +@cindex integer representation limits +@cindex representation limits, integer +@cindex limits, integer representation + +Sometimes it is necessary for programs to know about the internal +representation of various integer subtypes. For example, if you want +your program to be careful not to overflow an @code{int} counter +variable, you need to know what the largest representable value that +fits in an @code{int} is. These kinds of parameters can vary from +compiler to compiler and machine to machine. Another typical use of +this kind of parameter is in conditionalizing data structure definitions +with @samp{#ifdef} to select the most appropriate integer subtype that +can represent the required range of values. + +Macros representing the minimum and maximum limits of the integer types +are defined in the header file @file{limits.h}. The values of these +macros are all integer constant expressions. +@pindex limits.h + +@comment limits.h +@comment ANSI +@deftypevr Macro int CHAR_BIT +This is the number of bits in a @code{char}, usually eight. +@end deftypevr + +@comment limits.h +@comment ANSI +@deftypevr Macro int SCHAR_MIN +This is the minimum value that can be represented by a @code{signed char}. +@end deftypevr + +@comment limits.h +@comment ANSI +@deftypevr Macro int SCHAR_MAX +This is the maximum value that can be represented by a @code{signed char}. +@end deftypevr + +@comment limits.h +@comment ANSI +@deftypevr Macro int UCHAR_MAX +This is the maximum value that can be represented by a @code{unsigned char}. +(The minimum value of an @code{unsigned char} is zero.) +@end deftypevr + +@comment limits.h +@comment ANSI +@deftypevr Macro int CHAR_MIN +This is the minimum value that can be represented by a @code{char}. +It's equal to @code{SCHAR_MIN} if @code{char} is signed, or zero +otherwise. +@end deftypevr + +@comment limits.h +@comment ANSI +@deftypevr Macro int CHAR_MAX +This is the maximum value that can be represented by a @code{char}. +It's equal to @code{SCHAR_MAX} if @code{char} is signed, or +@code{UCHAR_MAX} otherwise. +@end deftypevr + +@comment limits.h +@comment ANSI +@deftypevr Macro int SHRT_MIN +This is the minimum value that can be represented by a @code{signed +short int}. On most machines that the GNU C library runs on, +@code{short} integers are 16-bit quantities. +@end deftypevr + +@comment limits.h +@comment ANSI +@deftypevr Macro int SHRT_MAX +This is the maximum value that can be represented by a @code{signed +short int}. +@end deftypevr + +@comment limits.h +@comment ANSI +@deftypevr Macro int USHRT_MAX +This is the maximum value that can be represented by an @code{unsigned +short int}. (The minimum value of an @code{unsigned short int} is zero.) +@end deftypevr + +@comment limits.h +@comment ANSI +@deftypevr Macro int INT_MIN +This is the minimum value that can be represented by a @code{signed +int}. On most machines that the GNU C system runs on, an @code{int} is +a 32-bit quantity. +@end deftypevr + +@comment limits.h +@comment ANSI +@deftypevr Macro int INT_MAX +This is the maximum value that can be represented by a @code{signed +int}. +@end deftypevr + +@comment limits.h +@comment ANSI +@deftypevr Macro {unsigned int} UINT_MAX +This is the maximum value that can be represented by an @code{unsigned +int}. (The minimum value of an @code{unsigned int} is zero.) +@end deftypevr + +@comment limits.h +@comment ANSI +@deftypevr Macro {long int} LONG_MIN +This is the minimum value that can be represented by a @code{signed long +int}. On most machines that the GNU C system runs on, @code{long} +integers are 32-bit quantities, the same size as @code{int}. +@end deftypevr + +@comment limits.h +@comment ANSI +@deftypevr Macro {long int} LONG_MAX +This is the maximum value that can be represented by a @code{signed long +int}. +@end deftypevr + +@comment limits.h +@comment ANSI +@deftypevr Macro {unsigned long int} ULONG_MAX +This is the maximum value that can be represented by an @code{unsigned +long int}. (The minimum value of an @code{unsigned long int} is zero.) +@end deftypevr + +@strong{Incomplete:} There should be corresponding limits for the GNU +C Compiler's @code{long long} type, too. (But they are not now present +in the header file.) + +The header file @file{limits.h} also defines some additional constants +that parameterize various operating system and file system limits. These +constants are described in @ref{System Parameters} and @ref{File System +Parameters}. +@pindex limits.h + + +@node Floating-Point Limits , , Integer Representation Limits, Representation Limits +@section Floating-Point Limits +@cindex floating-point number representation +@cindex representation, floating-point number +@cindex limits, floating-point representation + +Because floating-point numbers are represented internally as approximate +quantities, algorithms for manipulating floating-point data often need +to be parameterized in terms of the accuracy of the representation. +Some of the functions in the C library itself need this information; for +example, the algorithms for printing and reading floating-point numbers +(@pxref{I/O on Streams}) and for calculating trigonometric and +irrational functions (@pxref{Mathematics}) use information about the +underlying floating-point representation to avoid round-off error and +loss of accuracy. User programs that implement numerical analysis +techniques also often need to be parameterized in this way in order to +minimize or compute error bounds. + +The specific representation of floating-point numbers varies from +machine to machine. The GNU C library defines a set of parameters which +characterize each of the supported floating-point representations on a +particular system. + +@menu +* Floating-Point Representation:: Definitions of terminology. +* Floating-Point Parameters:: Descriptions of the library + facilities. +* IEEE Floating Point:: An example of a common + representation. +@end menu + +@node Floating-Point Representation, Floating-Point Parameters, , Floating-Point Limits +@subsection Floating-Point Representation + +This section introduces the terminology used to characterize the +representation of floating-point numbers. + +You are probably already familiar with most of these concepts in terms +of scientific or exponential notation for floating-point numbers. For +example, the number @code{123456.0} could be expressed in exponential +notation as @code{1.23456e+05}, a shorthand notation indicating that the +mantissa @code{1.23456} is multiplied by the base @code{10} raised to +power @code{5}. + +More formally, the internal representation of a floating-point number +can be characterized in terms of the following parameters: + +@itemize @bullet +@item +The @dfn{sign} is either @code{-1} or @code{1}. +@cindex sign (of floating-point number) + +@item +The @dfn{base} or @dfn{radix} for exponentiation; an integer greater +than @code{1}. This is a constant for the particular representation. +@cindex base (of floating-point number) +@cindex radix (of floating-point number) + +@item +The @dfn{exponent} to which the base is raised. The upper and lower +bounds of the exponent value are constants for the particular +representation. +@cindex exponent (of floating-point number) + +Sometimes, in the actual bits representing the floating-point number, +the exponent is @dfn{biased} by adding a constant to it, to make it +always be represented as an unsigned quantity. This is only important +if you have some reason to pick apart the bit fields making up the +floating-point number by hand, which is something for which the GNU +library provides no support. So this is ignored in the discussion that +follows. +@cindex bias (of floating-point number exponent) + +@item +The value of the @dfn{mantissa} or @dfn{significand}, which is an +unsigned integer. +@cindex mantissa (of floating-point number) +@cindex significand (of floating-point number) + +@item +The @dfn{precision} of the mantissa. If the base of the representation +is @var{b}, then the precision is the number of base-@var{b} digits in +the mantissa. This is a constant for the particular representation. + +Many floating-point representations have an implicit @dfn{hidden bit} in +the mantissa. Any such hidden bits are counted in the precision. +Again, the GNU library provides no facilities for dealing with such low-level +aspects of the representation. +@cindex precision (of floating-point number) +@cindex hidden bit (of floating-point number mantissa) +@end itemize + +The mantissa of a floating-point number actually represents an implicit +fraction whose denominator is the base raised to the power of the +precision. Since the largest representable mantissa is one less than +this denominator, the value of the fraction is always strictly less than +@code{1}. The mathematical value of a floating-point number is then the +product of this fraction; the sign; and the base raised to the exponent. + +If the floating-point number is @dfn{normalized}, the mantissa is also +greater than or equal to the base raised to the power of one less +than the precision (unless the number represents a floating-point zero, +in which case the mantissa is zero). The fractional quantity is +therefore greater than or equal to @code{1/@var{b}}, where @var{b} is +the base. +@cindex normalized floating-point number + +@node Floating-Point Parameters, IEEE Floating Point, Floating-Point Representation, Floating-Point Limits +@subsection Floating-Point Parameters + +@strong{Incomplete:} This section needs some more concrete examples +of what these parameters mean and how to use them in a program. + +These macro definitions can be accessed by including the header file +@file{float.h} in your program. +@pindex float.h + +Macro names starting with @samp{FLT_} refer to the @code{float} type, +while names beginning with @samp{DBL_} refer to the @code{double} type +and names beginning with @samp{LDBL_} refer to the @code{long double} +type. (In implementations that do not support @code{long double} as +a distinct data type, the values for those constants are the same +as the corresponding constants for the @code{double} type.)@refill +@cindex @code{float} representation limits +@cindex @code{double} representation limits +@cindex @code{long double} representation limits + +Of these macros, only @code{FLT_RADIX} is guaranteed to be a constant +expression. The other macros listed here cannot be reliably used in +places that require constant expressions, such as @samp{#if} +preprocessing directives or array size specifications. + +Although the ANSI C standard specifies minimum and maximum values for +most of these parameters, the GNU C implementation uses whatever +floating-point representations are supported by the underlying hardware. +So whether GNU C actually satisfies the ANSI C requirements depends on +what machine it is running on. + +@comment float.h +@comment ANSI +@deftypevr Macro int FLT_ROUNDS +This value characterizes the rounding mode for floating-point addition. +The following values indicate standard rounding modes: + +@table @code +@item -1 +The mode is indeterminable. +@item 0 +Rounding is towards zero. +@item 1 +Rounding is to the nearest number. +@item 2 +Rounding is towards positive infinity. +@item 3 +Rounding is towards negative infinity. +@end table + +@noindent +Any other value represents a machine-dependent nonstandard rounding +mode. +@end deftypevr + +@comment float.h +@comment ANSI +@deftypevr Macro int FLT_RADIX +This is the value of the base, or radix, of exponent representation. +This is guaranteed to be a constant expression, unlike the other macros +described in this section. +@end deftypevr + +@comment float.h +@comment ANSI +@deftypevr Macro int FLT_MANT_DIG +This is the number of base-@code{FLT_RADIX} digits in the floating-point +mantissa for the @code{float} data type. +@end deftypevr + +@comment float.h +@comment ANSI +@deftypevr Macro int DBL_MANT_DIG +This is the number of base-@code{FLT_RADIX} digits in the floating-point +mantissa for the @code{double} data type. +@end deftypevr + +@comment float.h +@comment ANSI +@deftypevr Macro int LDBL_MANT_DIG +This is the number of base-@code{FLT_RADIX} digits in the floating-point +mantissa for the @code{long double} data type. +@end deftypevr + +@comment float.h +@comment ANSI +@deftypevr Macro int FLT_DIG +This is the number of decimal digits of precision for the @code{float} +data type. Technically, if @var{p} and @var{b} are the precision and +base (respectively) for the representation, then the decimal precision +@var{q} is the maximum number of decimal digits such that any floating +point number with @var{q} base 10 digits can be rounded to a floating +point number with @var{p} base @var{b} digits and back again, without +change to the @var{q} decimal digits. + +The value of this macro is guaranteed to be at least @code{6}. +@end deftypevr + +@comment float.h +@comment ANSI +@deftypevr Macro int DBL_DIG +This is similar to @code{FLT_DIG}, but is for the @code{double} data +type. The value of this macro is guaranteed to be at least @code{10}. +@end deftypevr + +@comment float.h +@comment ANSI +@deftypevr Macro int LDBL_DIG +This is similar to @code{FLT_DIG}, but is for the @code{long double} +data type. The value of this macro is guaranteed to be at least +@code{10}. +@end deftypevr + +@comment float.h +@comment ANSI +@deftypevr Macro int FLT_MIN_EXP +This is the minimum negative integer such that the mathematical value +@code{FLT_RADIX} raised to this power minus 1 can be represented as a +normalized floating-point number of type @code{float}. In terms of the +actual implementation, this is just the smallest value that can be +represented in the exponent field of the number. +@end deftypevr + +@comment float.h +@comment ANSI +@deftypevr Macro int DBL_MIN_EXP +This is similar to @code{FLT_MIN_EXP}, but is for the @code{double} data +type. +@end deftypevr + +@comment float.h +@comment ANSI +@deftypevr Macro int LDBL_MIN_EXP +This is similar to @code{FLT_MIN_EXP}, but is for the @code{long double} +data type. +@end deftypevr + +@comment float.h +@comment ANSI +@deftypevr Macro int FLT_MIN_10_EXP +This is the minimum negative integer such that the mathematical value +@code{10} raised to this power minus 1 can be represented as a +normalized floating-point number of type @code{float}. This is +guaranteed to be no greater than @code{-37}. +@end deftypevr + +@comment float.h +@comment ANSI +@deftypevr Macro int DBL_MIN_10_EXP +This is similar to @code{FLT_MIN_10_EXP}, but is for the @code{double} +data type. +@end deftypevr + +@comment float.h +@comment ANSI +@deftypevr Macro int LDBL_MIN_10_EXP +This is similar to @code{FLT_MIN_10_EXP}, but is for the @code{long +double} data type. +@end deftypevr + + + +@comment float.h +@comment ANSI +@deftypevr Macro int FLT_MAX_EXP +This is the maximum negative integer such that the mathematical value +@code{FLT_RADIX} raised to this power minus 1 can be represented as a +floating-point number of type @code{float}. In terms of the actual +implementation, this is just the largest value that can be represented +in the exponent field of the number. +@end deftypevr + +@comment float.h +@comment ANSI +@deftypevr Macro int DBL_MAX_EXP +This is similar to @code{FLT_MAX_EXP}, but is for the @code{double} data +type. +@end deftypevr + +@comment float.h +@comment ANSI +@deftypevr Macro int LDBL_MAX_EXP +This is similar to @code{FLT_MAX_EXP}, but is for the @code{long double} +data type. +@end deftypevr + +@comment float.h +@comment ANSI +@deftypevr Macro int FLT_MAX_10_EXP +This is the maximum negative integer such that the mathematical value +@code{10} raised to this power minus 1 can be represented as a +normalized floating-point number of type @code{float}. This is +guaranteed to be at least @code{37}. +@end deftypevr + +@comment float.h +@comment ANSI +@deftypevr Macro int DBL_MAX_10_EXP +This is similar to @code{FLT_MAX_10_EXP}, but is for the @code{double} +data type. +@end deftypevr + +@comment float.h +@comment ANSI +@deftypevr Macro int LDBL_MAX_10_EXP +This is similar to @code{FLT_MAX_10_EXP}, but is for the @code{long +double} data type. +@end deftypevr + + +@comment float.h +@comment ANSI +@deftypevr Macro double FLT_MAX +The value of this macro is the maximum representable floating-point +number of type @code{float}, and is guaranteed to be at least +@code{1E+37}. +@end deftypevr + +@comment float.h +@comment ANSI +@deftypevr Macro double DBL_MAX +The value of this macro is the maximum representable floating-point +number of type @code{double}, and is guaranteed to be at least +@code{1E+37}. +@end deftypevr + +@comment float.h +@comment ANSI +@deftypevr Macro {long double} LDBL_MAX +The value of this macro is the maximum representable floating-point +number of type @code{long double}, and is guaranteed to be at least +@code{1E+37}. +@end deftypevr + + +@comment float.h +@comment ANSI +@deftypevr Macro double FLT_MIN +The value of this macro is the minimum normalized positive +floating-point number that is representable by type @code{float}, and is +guaranteed to be no more than @code{1E-37}. +@end deftypevr + +@comment float.h +@comment ANSI +@deftypevr Macro double DBL_MIN +The value of this macro is the minimum normalized positive +floating-point number that is representable by type @code{double}, and +is guaranteed to be no more than @code{1E-37}. +@end deftypevr + +@comment float.h +@comment ANSI +@deftypevr Macro {long double} LDBL_MIN +The value of this macro is the minimum normalized positive +floating-point number that is representable by type @code{long double}, +and is guaranteed to be no more than @code{1E-37}. +@end deftypevr + + +@comment float.h +@comment ANSI +@deftypevr Macro double FLT_EPSILON +This is the minimum positive floating-point number of type @code{float} +such that @code{1.0 + FLT_EPSILON != 1.0} is true. It's guaranteed to +be no greater than @code{1E-5}. +@end deftypevr + +@comment float.h +@comment ANSI +@deftypevr Macro double DBL_EPSILON +This is similar to @code{FLT_EPSILON}, but is for the @code{double} +type. The maximum value is @code{1E-9}. +@end deftypevr + +@comment float.h +@comment ANSI +@deftypevr Macro {long double} LDBL_EPSILON +This is similar to @code{FLT_EPSILON}, but is for the @code{long double} +type. The maximum value is @code{1E-9}. +@end deftypevr + + +@node IEEE Floating Point, , Floating-Point Parameters, Floating-Point Limits +@subsection IEEE Floating Point +@cindex IEEE floating-point representation +@cindex floating-point, IEEE +@cindex IEEE Std 754 + + +Here is an example showing how these parameters work for a common +floating point representation, specified by the @cite{IEEE Standard for +Binary Floating-Point Arithmetic (ANSI/IEEE Std 754-1985)}. Nearly +all computers today use this format. + +The IEEE single-precision float representation uses a base of 2. There +is a sign bit, a mantissa with 23 bits plus one hidden bit (so the total +precision is 24 base-2 digits), and an 8-bit exponent that can represent +values in the range -125 to 128, inclusive. + +So, for an implementation that uses this representation for the +@code{float} data type, appropriate values for the corresponding +parameters are: + +@example +FLT_RADIX 2 +FLT_MANT_DIG 24 +FLT_DIG 6 +FLT_MIN_EXP -125 +FLT_MIN_10_EXP -37 +FLT_MAX_EXP 128 +FLT_MAX_10_EXP +38 +FLT_MIN 1.17549435E-38F +FLT_MAX 3.40282347E+38F +FLT_EPSILON 1.19209290E-07F +@end example + +Here are the values for the @code{double} data type: + +@example +DBL_MANT_DIG 53 +DBL_DIG 15 +DBL_MIN_EXP -1021 +DBL_MIN_10_EXP -307 +DBL_MAX_EXP 1024 +DBL_MAX_10_EXP 308 +DBL_MAX 1.7976931348623157E+308 +DBL_MIN 2.2250738585072014E-308 +DBL_EPSILON 2.2204460492503131E-016 +@end example |