diff options
author | Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> | 2023-03-20 13:01:16 -0300 |
---|---|---|
committer | Adhemerval Zanella <adhemerval.zanella@linaro.org> | 2023-04-03 16:36:24 -0300 |
commit | 34b9f8bc170810c44184ad57ecf1800587e752a6 (patch) | |
tree | 9be9f9e44652729248cfef192eafa4227d670752 /sysdeps/ieee754/dbl-64/math_config.h | |
parent | 5c11701c518276fcf12ff7d8f27e3c7102e97542 (diff) | |
download | glibc-34b9f8bc170810c44184ad57ecf1800587e752a6.zip glibc-34b9f8bc170810c44184ad57ecf1800587e752a6.tar.gz glibc-34b9f8bc170810c44184ad57ecf1800587e752a6.tar.bz2 |
math: Improve fmod
This uses a new algorithm similar to already proposed earlier [1].
With x = mx * 2^ex and y = my * 2^ey (mx, my, ex, ey being integers),
the simplest implementation is:
mx * 2^ex == 2 * mx * 2^(ex - 1)
while (ex > ey)
{
mx *= 2;
--ex;
mx %= my;
}
With mx/my being mantissa of double floating pointer, on each step the
argument reduction can be improved 11 (which is sizeo of uint64_t minus
MANTISSA_WIDTH plus the signal bit):
while (ex > ey)
{
mx << 11;
ex -= 11;
mx %= my;
} */
The implementation uses builtin clz and ctz, along with shifts to
convert hx/hy back to doubles. Different than the original patch,
this path assume modulo/divide operation is slow, so use multiplication
with invert values.
I see the following performance improvements using fmod benchtests
(result only show the 'mean' result):
Architecture | Input | master | patch
-----------------|-----------------|----------|--------
x86_64 (Ryzen 9) | subnormals | 19.1584 | 12.5049
x86_64 (Ryzen 9) | normal | 1016.51 | 296.939
x86_64 (Ryzen 9) | close-exponents | 18.4428 | 16.0244
aarch64 (N1) | subnormal | 11.153 | 6.81778
aarch64 (N1) | normal | 528.649 | 155.62
aarch64 (N1) | close-exponents | 11.4517 | 8.21306
I also see similar improvements on arm-linux-gnueabihf when running on
the N1 aarch64 chips, where it a lot of soft-fp implementation (for
modulo, clz, ctz, and multiplication):
Architecture | Input | master | patch
-----------------|-----------------|----------|--------
armhf (N1) | subnormal | 15.908 | 15.1083
armhf (N1) | normal | 837.525 | 244.833
armhf (N1) | close-exponents | 16.2111 | 21.8182
Instead of using the math_private.h definitions, I used the
math_config.h instead which is used on newer math implementations.
Co-authored-by: kirill <kirill.okhotnikov@gmail.com>
[1] https://sourceware.org/pipermail/libc-alpha/2020-November/119794.html
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Diffstat (limited to 'sysdeps/ieee754/dbl-64/math_config.h')
-rw-r--r-- | sysdeps/ieee754/dbl-64/math_config.h | 61 |
1 files changed, 61 insertions, 0 deletions
diff --git a/sysdeps/ieee754/dbl-64/math_config.h b/sysdeps/ieee754/dbl-64/math_config.h index 3cbaeed..2049cea 100644 --- a/sysdeps/ieee754/dbl-64/math_config.h +++ b/sysdeps/ieee754/dbl-64/math_config.h @@ -43,6 +43,24 @@ # define TOINT_INTRINSICS 0 #endif +static inline int +clz_uint64 (uint64_t x) +{ + if (sizeof (uint64_t) == sizeof (unsigned long)) + return __builtin_clzl (x); + else + return __builtin_clzll (x); +} + +static inline int +ctz_uint64 (uint64_t x) +{ + if (sizeof (uint64_t) == sizeof (unsigned long)) + return __builtin_ctzl (x); + else + return __builtin_ctzll (x); +} + #if TOINT_INTRINSICS /* Round x to nearest int in all rounding modes, ties have to be rounded consistently with converttoint so the results match. If the result @@ -88,6 +106,49 @@ issignaling_inline (double x) return 2 * (ix ^ 0x0008000000000000) > 2 * 0x7ff8000000000000ULL; } +#define BIT_WIDTH 64 +#define MANTISSA_WIDTH 52 +#define EXPONENT_WIDTH 11 +#define MANTISSA_MASK UINT64_C(0x000fffffffffffff) +#define EXPONENT_MASK UINT64_C(0x7ff0000000000000) +#define EXP_MANT_MASK UINT64_C(0x7fffffffffffffff) +#define QUIET_NAN_MASK UINT64_C(0x0008000000000000) +#define SIGN_MASK UINT64_C(0x8000000000000000) + +static inline bool +is_nan (uint64_t x) +{ + return (x & EXP_MANT_MASK) > EXPONENT_MASK; +} + +static inline uint64_t +get_mantissa (uint64_t x) +{ + return x & MANTISSA_MASK; +} + +/* Convert integer number X, unbiased exponent EP, and sign S to double: + + result = X * 2^(EP+1 - exponent_bias) + + NB: zero is not supported. */ +static inline double +make_double (uint64_t x, int64_t ep, uint64_t s) +{ + int lz = clz_uint64 (x) - EXPONENT_WIDTH; + x <<= lz; + ep -= lz; + + if (__glibc_unlikely (ep < 0 || x == 0)) + { + x >>= -ep; + ep = 0; + } + + return asdouble (s + x + (ep << MANTISSA_WIDTH)); +} + + #define NOINLINE __attribute__ ((noinline)) /* Error handling tail calls for special cases, with a sign argument. |