aboutsummaryrefslogtreecommitdiff
path: root/manual/arith.texi
diff options
context:
space:
mode:
Diffstat (limited to 'manual/arith.texi')
-rw-r--r--manual/arith.texi404
1 files changed, 380 insertions, 24 deletions
diff --git a/manual/arith.texi b/manual/arith.texi
index d8703ea..86fb266 100644
--- a/manual/arith.texi
+++ b/manual/arith.texi
@@ -3,12 +3,17 @@
This chapter contains information about functions for doing basic
arithmetic operations, such as splitting a float into its integer and
-fractional parts. These functions are declared in the header file
-@file{math.h}.
+fractional parts or retrieving the imaginary part of a complex value.
+These functions are declared in the header files @file{math.h} and
+@file{complex.h}.
@menu
+* Infinity:: What is Infinity and how to test for it.
* Not a Number:: Making NaNs and testing for NaNs.
+* Imaginary Unit:: Constructing complex Numbers.
* Predicates on Floats:: Testing for infinity and for NaNs.
+* Floating-Point Classes:: Classifiy floating-point numbers.
+* Operations on Complex:: Projections, Conjugates, and Decomposing.
* Absolute Value:: Absolute value functions.
* Normalization Functions:: Hacks for radix-2 representations.
* Rounding and Remainders:: Determining the integer and
@@ -19,6 +24,44 @@ fractional parts. These functions are declared in the header file
from strings.
@end menu
+@node Infinity
+@section Infinity Values
+@cindex Infinity
+@cindex IEEE floating point
+
+Mathematical operations easily can produce as the result values which
+are not representable by the floating-point format. The functions in
+the mathematics library also have this problem. The situation is
+generally solved by raising an overflow exception and by returning a
+huge value.
+
+The @w{IEEE 754} floating-point defines a special value to be used in
+these situations. There is a special value for infinity.
+
+@comment math.h
+@comment ISO
+@deftypevr Macro float_t INFINITY
+A expression representing the inifite value. @code{INFINITY} values are
+produce by mathematical operations like @code{1.0 / 0.0}. It is
+possible to continue the computations with this value since the basic
+operations as well as the mathematical library functions are prepared to
+handle values like this.
+
+Beside @code{INFINITY} also the value @code{-INIFITY} is representable
+and it is handled differently if needed. It is possible to test a
+variables for infinite value using a simple comparison but the
+recommended way is to use the the @code{isinf} function.
+
+This macro was introduced in the @w{ISO C 9X} standard.
+@end deftypevr
+
+@vindex HUGE_VAL
+The macros @code{HUGE_VAL}, @code{HUGE_VALF} and @code{HUGE_VALL} are
+defined in a similar way but they are not required to represent the
+infinite value, only a very large value (@pxref{Domain and Range Errors}).
+If actually infinity is wanted, @code{INFINITY} should be used.
+
+
@node Not a Number
@section ``Not a Number'' Values
@cindex NaN
@@ -54,6 +97,46 @@ such as by defining @code{_GNU_SOURCE}, and then you must include
@file{math.h}.)
@end deftypevr
+@node Imaginary Unit
+@section Constructing complex Numbers
+
+@pindex complex.h
+To construct complex numbers it is necessary have a way to express the
+imaginary part of the numbers. In mathematics one uses the symbol ``i''
+to mark a number as imaginary. For convenienve the @file{complex.h}
+header defines two macros which allow to use a similar easy notation.
+
+@deftypevr Macro float_t _Imaginary_I
+This macro is a (compiler specific) representation of the value ``1i''.
+I.e., it is the value for which
+
+@smallexample
+_Imaginary_I * _Imaginary_I = -1
+@end smallexample
+
+@noindent
+One can use it to easily construct complex number like in
+
+@smallexample
+3.0 - _Imaginary_I * 4.0
+@end smallexample
+
+@noindent
+which results in the complex number with a real part of 3.0 and a
+imaginary part -4.0.
+@end deftypevr
+
+@noindent
+A more intuitive approach is to use the following macro.
+
+@deftypevr Macro float_t I
+This macro has exactly the same value as @code{_Imaginary_I}. The
+problem is that the name @code{I} very easily can clash with macros or
+variables in programs and so it might be a good idea to avoid this name
+and stay at the safe side by using @code{_Imaginary_I}.
+@end deftypevr
+
+
@node Predicates on Floats
@section Predicates on Floats
@@ -66,6 +149,10 @@ functions, and thus are available if you define @code{_BSD_SOURCE} or
@comment math.h
@comment BSD
@deftypefun int isinf (double @var{x})
+@end deftypefun
+@deftypefun int isinff (float @var{x})
+@end deftypefun
+@deftypefun int isinfl (long double @var{x})
This function returns @code{-1} if @var{x} represents negative infinity,
@code{1} if @var{x} represents positive infinity, and @code{0} otherwise.
@end deftypefun
@@ -73,6 +160,10 @@ This function returns @code{-1} if @var{x} represents negative infinity,
@comment math.h
@comment BSD
@deftypefun int isnan (double @var{x})
+@end deftypefun
+@deftypefun int isnanf (float @var{x})
+@end deftypefun
+@deftypefun int isnanl (long double @var{x})
This function returns a nonzero value if @var{x} is a ``not a number''
value, and zero otherwise. (You can just as well use @code{@var{x} !=
@var{x}} to get the same result).
@@ -81,6 +172,10 @@ value, and zero otherwise. (You can just as well use @code{@var{x} !=
@comment math.h
@comment BSD
@deftypefun int finite (double @var{x})
+@end deftypefun
+@deftypefun int finitef (float @var{x})
+@end deftypefun
+@deftypefun int finitel (long double @var{x})
This function returns a nonzero value if @var{x} is finite or a ``not a
number'' value, and zero otherwise.
@end deftypefun
@@ -103,6 +198,189 @@ does not fit the @w{ISO C} specification.
@strong{Portability Note:} The functions listed in this section are BSD
extensions.
+@node Floating-Point Classes
+@section Floating-Point Number Classification Functions
+
+Instead of using the BSD specific functions from the last section it is
+better to use those in this section will are introduced in the @w{ISO C
+9X} standard and are therefore widely available.
+
+@comment math.h
+@comment ISO
+@deftypefun int fpclassify (@emph{float-type} @var{x})
+This is a generic macro which works on all floating-point types and
+which returns a value of type @code{int}. The possible values are:
+
+@vtable @code
+@item FP_NAN
+ The floating-point number @var{x} is ``Not a Number'' (@pxref{Not a Number})
+@item FP_INFINITE
+ The value of @var{x} is either plus or minus infinity (@pxref{Infinity})
+@item FP_ZERO
+ The value of @var{x} is zero. In floating-point formats like @w{IEEE
+ 754} where the zero value can be signed this value is also returned if
+ @var{x} is minus zero.
+@item FP_SUBNORMAL
+ Some floating-point formats (such as @w{IEEE 754}) allow floating-point
+ numbers to be represented in a denormalized format. This happens if the
+ absolute value of the number is too small to be represented in the
+ normal format. @code{FP_SUBNORMAL} is returned for such values of @var{x}.
+@item FP_NORMAL
+ This value is returned for all other cases which means the number is a
+ plain floating-point number without special meaning.
+@end vtable
+
+This macro is useful if more than property of a number must be
+tested. If one only has to test for, e.g., a NaN value, there are
+function which are faster.
+@end deftypefun
+
+The remainder of this section introduces some more specific functions.
+They might be implemented faster than the call to @code{fpclassify} and
+if the actual need in the program is covered be these functions they
+should be used (and not @code{fpclassify}).
+
+@comment math.h
+@comment ISO
+@deftypefun int isfinite (@emph{float-type} @var{x})
+The value returned by this macro is nonzero if the value of @var{x} is
+not plus or minus infinity and not NaN. I.e., it could be implemented as
+
+@smallexample
+(fpclassify (x) != FP_NAN && fpclassify (x) != FP_INFINITE)
+@end smallexample
+
+@code{isfinite} is also implemented as a macro which can handle all
+floating-point types. Programs should use this function instead of
+@var{finite} (@pxref{Predicates on Floats}).
+@end deftypefun
+
+@comment math.h
+@comment ISO
+@deftypefun int isnormal (@emph{float-type} @var{x})
+If @code{isnormal} returns a nonzero value the value or @var{x} is
+neither a NaN, infinity, zero, nor a denormalized number. I.e., it
+could be implemented as
+
+@smallexample
+(fpclassify (x) == FP_NORMAL)
+@end smallexample
+@end deftypefun
+
+@comment math.h
+@comment ISO
+@deftypefun int isnan (@emph{float-type} @var{x})
+The situation with this macro is a bit complicated. Here @code{isnan}
+is a macro which can handle all kinds of floating-point types. It
+returns a nonzero value is @var{x} does not represent a NaN value and
+could be written like this
+
+@smallexample
+(fpclassify (x) == FP_NAN)
+@end smallexample
+
+The complication is that there is a function of the same name and the
+same semantic defined for compatibility with BSD (@pxref{Predicates on
+Floats}). Fortunately this should not yield to problems in most cases
+since the macro and the function have the same semantic. Should in a
+situation the function be absolutely necessary one can use
+
+@smallexample
+(isnan) (x)
+@end smallexample
+
+@noindent
+to avoid the macro expansion. Using the macro has two big adavantages:
+it is more portable and one does not have to choose the right function
+among @code{isnan}, @code{isnanf}, and @code{isnanl}.
+@end deftypefun
+
+
+@node Operations on Complex
+@section Projections, Conjugates, and Decomposing of Complex Numbers
+@cindex project complex numbers
+@cindex conjugate complex numbers
+@cindex decompose complex numbers
+
+This section lists functions performing some of the simple mathematical
+operations on complex numbers. Using any of the function requries that
+the C compiler understands the @code{complex} keyword, introduced to the
+C language in the @w{ISO C 9X} standard.
+
+@pindex complex.h
+The prototypes for all functions in this section can be found in
+@file{complex.h}. All functions are available in three variants, one
+for each of the three floating-point types.
+
+The easiest operation on complex numbers is the decomposition in the
+real part and the imaginary part. This is done by the next two
+functions.
+
+@comment complex.h
+@comment ISO
+@deftypefun double creal (complex double @var{z})
+@end deftypefun
+@deftypefun float crealf (complex float @var{z})
+@end deftypefun
+@deftypefun {long double} creall (complex long double @var{z})
+These functions return the real part of the complex number @var{z}.
+@end deftypefun
+
+@comment complex.h
+@comment ISO
+@deftypefun double cimag (complex double @var{z})
+@end deftypefun
+@deftypefun float cimagf (complex float @var{z})
+@end deftypefun
+@deftypefun {long double} cimagl (complex long double @var{z})
+These functions return the imaginary part of the complex number @var{z}.
+@end deftypefun
+
+
+The conjugate complex value of a given complex number has the same value
+for the real part but the complex part is negated.
+
+@comment complex.h
+@comment ISO
+@deftypefun {complex double} conj (complex double @var{z})
+@end deftypefun
+@deftypefun {complex float} conjf (complex float @var{z})
+@end deftypefun
+@deftypefun {complex long double} conjl (complex long double @var{z})
+These functions return the conjugate complex value of the complex number
+@var{z}.
+@end deftypefun
+
+@comment complex.h
+@comment ISO
+@deftypefun double carg (complex double @var{z})
+@end deftypefun
+@deftypefun float cargf (complex float @var{z})
+@end deftypefun
+@deftypefun {long double} cargl (complex long double @var{z})
+These functions return argument of the complex number @var{z}.
+
+Mathematically, the argument is the phase angle of @var{z} with a branch
+cut along the negative real axis.
+@end deftypefun
+
+@comment complex.h
+@comment ISO
+@deftypefun {complex double} cproj (complex double @var{z})
+@end deftypefun
+@deftypefun {complex float} cprojf (complex float @var{z})
+@end deftypefun
+@deftypefun {complex long double} cprojl (complex long double @var{z})
+Return the projection of the complex value @var{z} on the Riemann
+sphere. Values with a infinite complex part (even if the real part
+is NaN) are projected to positive infinte on the real axis. If the real part is infinite, the result is equivalent to
+
+@smallexample
+INFINITY + I * copysign (0.0, cimag (z))
+@end smallexample
+@end deftypefun
+
+
@node Absolute Value
@section Absolute Value
@cindex absolute value functions
@@ -117,7 +395,8 @@ whose imaginary part is @var{y}, the absolute value is @w{@code{sqrt
@pindex math.h
@pindex stdlib.h
Prototypes for @code{abs} and @code{labs} are in @file{stdlib.h};
-@code{fabs} and @code{cabs} are declared in @file{math.h}.
+@code{fabs}, @code{fabsf} and @code{fabsl} are declared in @file{math.h};
+@code{cabs}, @code{cabsf} and @code{cabsl} are declared in @file{complex.h}.
@comment stdlib.h
@comment ISO
@@ -139,20 +418,28 @@ are of type @code{long int} rather than @code{int}.
@comment math.h
@comment ISO
@deftypefun double fabs (double @var{number})
+@end deftypefun
+@deftypefun float fabsf (float @var{number})
+@end deftypefun
+@deftypefun {long double} fabsl (long double @var{number})
This function returns the absolute value of the floating-point number
@var{number}.
@end deftypefun
-@comment math.h
-@comment BSD
-@deftypefun double cabs (struct @{ double real, imag; @} @var{z})
-The @code{cabs} function returns the absolute value of the complex
-number @var{z}, whose real part is @code{@var{z}.real} and whose
-imaginary part is @code{@var{z}.imag}. (See also the function
-@code{hypot} in @ref{Exponents and Logarithms}.) The value is:
+@comment complex.h
+@comment ISO
+@deftypefun double cabs (complex double @var{z})
+@end deftypefun
+@deftypefun float cabsf (complex float @var{z})
+@end deftypefun
+@deftypefun {long double} cabsl (complex long double @var{z})
+These functions return the absolute value of the complex number @var{z}.
+The compiler must support complex numbers to use these functions. (See
+also the function @code{hypot} in @ref{Exponents and Logarithms}.) The
+value is:
@smallexample
-sqrt (@var{z}.real*@var{z}.real + @var{z}.imag*@var{z}.imag)
+sqrt (creal (@var{z}) * creal (@var{z}) + cimag (@var{z}) * cimag (@var{z}))
@end smallexample
@end deftypefun
@@ -174,7 +461,11 @@ All these functions are declared in @file{math.h}.
@comment math.h
@comment ISO
@deftypefun double frexp (double @var{value}, int *@var{exponent})
-The @code{frexp} function is used to split the number @var{value}
+@end deftypefun
+@deftypefun float frexpf (float @var{value}, int *@var{exponent})
+@end deftypefun
+@deftypefun {long double} frexpl (long double @var{value}, int *@var{exponent})
+These functions are used to split the number @var{value}
into a normalized fraction and an exponent.
If the argument @var{value} is not zero, the return value is @var{value}
@@ -193,7 +484,11 @@ zero is stored in @code{*@var{exponent}}.
@comment math.h
@comment ISO
@deftypefun double ldexp (double @var{value}, int @var{exponent})
-This function returns the result of multiplying the floating-point
+@end deftypefun
+@deftypefun float ldexpf (float @var{value}, int @var{exponent})
+@end deftypefun
+@deftypefun {long double} ldexpl (long double @var{value}, int @var{exponent})
+These functions return the result of multiplying the floating-point
number @var{value} by 2 raised to the power @var{exponent}. (It can
be used to reassemble floating-point numbers that were taken apart
by @code{frexp}.)
@@ -207,13 +502,21 @@ equivalent to those of @code{ldexp} and @code{frexp}:
@comment math.h
@comment BSD
@deftypefun double scalb (double @var{value}, int @var{exponent})
+@end deftypefun
+@deftypefun float scalbf (float @var{value}, int @var{exponent})
+@end deftypefun
+@deftypefun {long double} scalbl (long double @var{value}, int @var{exponent})
The @code{scalb} function is the BSD name for @code{ldexp}.
@end deftypefun
@comment math.h
@comment BSD
@deftypefun double logb (double @var{x})
-This BSD function returns the integer part of the base-2 logarithm of
+@end deftypefun
+@deftypefun float logbf (float @var{x})
+@end deftypefun
+@deftypefun {long double} logbl (long double @var{x})
+These BSD functions return the integer part of the base-2 logarithm of
@var{x}, an integer value represented in type @code{double}. This is
the highest integer power of @code{2} contained in @var{x}. The sign of
@var{x} is ignored. For example, @code{logb (3.5)} is @code{1.0} and
@@ -231,11 +534,28 @@ The value returned by @code{logb} is one less than the value that
@end deftypefun
@comment math.h
-@comment BSD
+@comment ISO
@deftypefun double copysign (double @var{value}, double @var{sign})
-The @code{copysign} function returns a value whose absolute value is the
+@end deftypefun
+@deftypefun float copysignf (float @var{value}, float @var{sign})
+@end deftypefun
+@deftypefun {long double} copysignl (long double @var{value}, long double @var{sign})
+These functions return a value whose absolute value is the
same as that of @var{value}, and whose sign matches that of @var{sign}.
-This is a BSD function.
+This function appears in BSD and was standardized in @w{ISO C 9X}.
+@end deftypefun
+
+@comment math.h
+@comment ISO
+@deftypefun int signbit (@emph{float-type} @var{x})
+@code{signbit} is a generic macro which can work on all floating-point
+types. It returns a nonzero value if the value of @var{x} has its sign
+bit set.
+
+This is not the same as @code{x < 0.0} since in some floating-point
+formats (e.g., @w{IEEE 754}) the zero value is optionally signed. The
+comparison @code{-0.0 < 0.0} will not be true while @code{signbit
+(-0.0)} will return a nonzeri value.
@end deftypefun
@node Rounding and Remainders
@@ -260,7 +580,11 @@ result as a @code{double} instead to get around this problem.
@comment math.h
@comment ISO
@deftypefun double ceil (double @var{x})
-The @code{ceil} function rounds @var{x} upwards to the nearest integer,
+@end deftypefun
+@deftypefun float ceilf (float @var{x})
+@end deftypefun
+@deftypefun {long double} ceill (long double @var{x})
+These functions round @var{x} upwards to the nearest integer,
returning that value as a @code{double}. Thus, @code{ceil (1.5)}
is @code{2.0}.
@end deftypefun
@@ -268,15 +592,23 @@ is @code{2.0}.
@comment math.h
@comment ISO
@deftypefun double floor (double @var{x})
-The @code{ceil} function rounds @var{x} downwards to the nearest
+@end deftypefun
+@deftypefun float floorf (float @var{x})
+@end deftypefun
+@deftypefun {long double} floorl (long double @var{x})
+These functions round @var{x} downwards to the nearest
integer, returning that value as a @code{double}. Thus, @code{floor
(1.5)} is @code{1.0} and @code{floor (-1.5)} is @code{-2.0}.
@end deftypefun
@comment math.h
-@comment BSD
+@comment ISO
@deftypefun double rint (double @var{x})
-This function rounds @var{x} to an integer value according to the
+@end deftypefun
+@deftypefun float rintf (float @var{x})
+@end deftypefun
+@deftypefun {long double} rintl (long double @var{x})
+These functions round @var{x} to an integer value according to the
current rounding mode. @xref{Floating Point Parameters}, for
information about the various rounding modes. The default
rounding mode is to round to the nearest integer; some machines
@@ -286,8 +618,24 @@ you explicit select another.
@comment math.h
@comment ISO
+@deftypefun double nearbyint (double @var{x})
+@end deftypefun
+@deftypefun float nearbyintf (float @var{x})
+@end deftypefun
+@deftypefun {long double} nearbyintl (long double @var{x})
+These functions return the same value as the @code{rint} functions but
+even some rounding actually takes place @code{nearbyint} does @emph{not}
+raise the inexact exception.
+@end deftypefun
+
+@comment math.h
+@comment ISO
@deftypefun double modf (double @var{value}, double *@var{integer-part})
-This function breaks the argument @var{value} into an integer part and a
+@end deftypefun
+@deftypefun float modff (flaot @var{value}, float *@var{integer-part})
+@end deftypefun
+@deftypefun {long double} modfl (long double @var{value}, long double *@var{integer-part})
+These functions break the argument @var{value} into an integer part and a
fractional part (between @code{-1} and @code{1}, exclusive). Their sum
equals @var{value}. Each of the parts has the same sign as @var{value},
so the rounding of the integer part is towards zero.
@@ -300,7 +648,11 @@ returns @code{0.5} and stores @code{2.0} into @code{intpart}.
@comment math.h
@comment ISO
@deftypefun double fmod (double @var{numerator}, double @var{denominator})
-This function computes the remainder from the division of
+@end deftypefun
+@deftypefun float fmodf (float @var{numerator}, float @var{denominator})
+@end deftypefun
+@deftypefun {long double} fmodl (long double @var{numerator}, long double @var{denominator})
+These functions compute the remainder from the division of
@var{numerator} by @var{denominator}. Specifically, the return value is
@code{@var{numerator} - @w{@var{n} * @var{denominator}}}, where @var{n}
is the quotient of @var{numerator} divided by @var{denominator}, rounded
@@ -317,7 +669,11 @@ If @var{denominator} is zero, @code{fmod} fails and sets @code{errno} to
@comment math.h
@comment BSD
@deftypefun double drem (double @var{numerator}, double @var{denominator})
-The function @code{drem} is like @code{fmod} except that it rounds the
+@end deftypefun
+@deftypefun float dremf (float @var{numerator}, float @var{denominator})
+@end deftypefun
+@deftypefun {long double} dreml (long double @var{numerator}, long double @var{denominator})
+These functions are like @code{fmod} etc except that it rounds the
internal quotient @var{n} to the nearest integer instead of towards zero
to an integer. For example, @code{drem (6.5, 2.3)} returns @code{-0.4},
which is @code{6.5} minus @code{6.9}.