aboutsummaryrefslogtreecommitdiff
path: root/math
AgeCommit message (Collapse)AuthorFilesLines
2025-12-05aarch64: Implement AdvSIMD and SVE rsqrt(f) routinesJames Chesterman1-1/+1
Vector variants of the new C23 rsqrt routines for both AdvSIMD and SVE, as well as in both single and double precision. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2025-11-27Define C23 header version macrosJoseph Myers4-0/+16
C23 defines library macros __STDC_VERSION_<header>_H__ to indicate that a header has support for new / changed features from C23. Now that all the required library features are implemented in glibc, define these macros. I'm not sure this is sufficiently much of a user-visible feature to be worth a mention in NEWS. Tested for x86_64. There are various optional C23 features we don't yet have, of which I might look at the Annex H ones (floating-point encoding conversion functions and _Float16 functions) next. * Optional time bases TIME_MONOTONIC, TIME_ACTIVE, TIME_THREAD_ACTIVE. See <https://sourceware.org/pipermail/libc-alpha/2023-June/149264.html> - we need to review / update that patch. (I think patch 2/2, inventing new names for all the nonstandard CLOCK_* supported by the Linux kernel, is rather more dubious.) * Updating conform/ tests for C23. * Defining the rounding mode macro FE_TONEARESTFROMZERO for RISC-V (as far as I know, the only architecture supported by glibc that has hardware support for this rounding mode for binary floating point) and supporting it throughout glibc and its tests (especially the string/numeric conversions in both directions that explicitly handle each possible rounding mode, and various tests that do likewise). * Annex H floating-point encoding conversion functions. (It's not entirely clear which are optional even given support for Annex H; there's some wording applied inconsistently about only being required when non-arithmetic interchange formats are supported; see the comments I raised on the WG14 reflector on 23 Oct 2025.) * _Float16 functions (and other header and testcase support for this type). * Decimal floating-point support. * Fully supporting __int128 and unsigned __int128 as integer types wider than intmax_t, as permitted by C23. Would need doing in coordination with GCC, see GCC bug 113887 for more discussion of what's involved.
2025-11-26math: Sync atanh from CORE-MATHAdhemerval Zanella2-0/+280
The CORE-MATH commit dc9465e7 fixes some issues: Failure: Test: atanh_towardzero (0x8.3f79103b3c64p-4) Result: is: 5.7018661316561103e-01 0x1.23ef7ff0539c6p-1 should be: 5.7018661316561092e-01 0x1.23ef7ff0539c5p-1 difference: 1.1102230246251565e-16 0x1.0000000000000p-53 ulp : 1.0000 max.ulp : 0.0000 Failure: Test: atanh_towardzero (0x8.3f7d95aabaf7p-4) Result: is: 5.7019248543911060e-01 0x1.23f044fac5997p-1 should be: 5.7019248543911049e-01 0x1.23f044fac5996p-1 difference: 1.1102230246251565e-16 0x1.0000000000000p-53 ulp : 1.0000 max.ulp : 0.0000 Failure: Test: atanh_towardzero (0x8.3f805380d6728p-4) Result: is: 5.7019604623795527e-01 0x1.23f0bc75cd113p-1 should be: 5.7019604623795516e-01 0x1.23f0bc75cd112p-1 difference: 1.1102230246251565e-16 0x1.0000000000000p-53 ulp : 1.0000 max.ulp : 0.0000 Maximal error of `atanh_towardzero' is : 1 ulp accepted: 0 ulp Checked on x86_64-linux-gnu, x86_64-linux-gnu-v3, aarch64-linux-gnu, and i686-linux-gnu.
2025-11-21configure: Only use -fno-fp-int-builtin-inexact if compiler supports itAdhemerval Zanella1-16/+16
Checked on x86_64-linux-gnu. Reviewed-by: Sam James <sam@gentoo.org>
2025-11-19math: Sync atanh from CORE-MATHAdhemerval Zanella2-0/+70
The CORE-MATH commit 703d7487 fixes some issues for RNDZ: Failure: Test: atanh_towardzero (0x5.96200b978b69cp-4) Result: is: 3.6447730550366463e-01 0x1.753989ed16faap-2 should be: 3.6447730550366458e-01 0x1.753989ed16fa9p-2 difference: 5.5511151231257827e-17 0x1.0000000000000p-54 ulp : 1.0000 max.ulp : 0.0000 Maximal error of `atanh_towardzero' is : 1 ulp accepted: 0 ulp Checked on x86_64-linux-gnu, x86_64-linux-gnu-v3, aarch64-linux-gnu, and i686-linux-gnu.
2025-11-19math: Sync acosh from CORE-MATHAdhemerval Zanella2-0/+140
The CORE-MATH commit 6736002f fixes some issues for RNDZ: Failure: Test: acosh_towardzero (0x1.08000c1e79fp+0) Result: is: 2.4935636091994373e-01 0x1.feae8c399b18cp-3 should be: 2.4935636091994370e-01 0x1.feae8c399b18bp-3 difference: 2.7755575615628913e-17 0x1.0000000000000p-55 ulp : 1.0000 max.ulp : 0.0000 Failure: Test: acosh_towardzero (0x1.080016353964ep+0) Result: is: 2.4935874767710369e-01 0x1.feafcc91f518ep-3 should be: 2.4935874767710367e-01 0x1.feafcc91f518dp-3 difference: 2.7755575615628913e-17 0x1.0000000000000p-55 ulp : 1.0000 max.ulp : 0.0000 Maximal error of `acosh_towardzero' is : 1 ulp accepted: 0 ulp This only happens when the ISA supports fma, such as x86_64-v3, aarch64, or powerpc. Checked on x86_64-linux-gnu, x86_64-linux-gnu-v3, aarch64-linux-gnu, and i686-linux-gnu.
2025-11-17math: Don't redirect inlined builtin math functionsAdhemerval Zanella2-3/+0
When we want to inline builtin math functions, like truncf, for extern float truncf (float __x) __attribute__ ((__nothrow__ )) __attribute__ ((__const__)); extern float __truncf (float __x) __attribute__ ((__nothrow__ )) __attribute__ ((__const__)); float (truncf) (float) asm ("__truncf"); compiler may redirect truncf calls to __truncf, instead of inlining it (for instance, clang). The USE_TRUNCF_BUILTIN is 1 to indicate that truncf should be inlined. In this case, we don't want the truncf redirection: 1. For each math function which may be inlined, we define #if USE_TRUNCF_BUILTIN # define NO_truncf_BUILTIN inline_truncf #else # define NO_truncf_BUILTIN truncf #endif in <math-use-builtins.h>. 2. Include <math-use-builtins.h> in include/math.h. 3. Change MATH_REDIRECT to #define MATH_REDIRECT(FUNC, PREFIX, ARGS) \ float (NO_ ## FUNC ## f ## _BUILTIN) (ARGS (float)) \ asm (PREFIX #FUNC "f"); With this change If USE_TRUNCF_BUILTIN is 0, we get float (truncf) (float) asm ("__truncf"); truncf will be redirected to __truncf. And for USE_TRUNCF_BUILTIN 1, we get: float (inline_truncf) (float) asm ("__truncf"); In both cases either truncf will be inlined or the internal alias (__truncf) will be called. It is not required for all math-use-builtin symbol, only the one defined in math.h. It also allows to remove all the math-use-builtin inclusion, since it is now implicitly included by math.h. For MIPS, some math-use-builtin headers include sysdep.h and this in turn includes a lot of extra headers that do not allow ldbl-128 code to override alias definition (math.h will include some stdlib.h definition). The math-use-builtin only requires the __mips_isa_rev, so move the defintion to sgidefs.h. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-11-13Change fromfp functions to return floating types following C23 (bug 28327)Joseph Myers21-123/+40843
As discussed in bug 28327, C23 changed the fromfp functions to return floating types instead of intmax_t / uintmax_t. (Although the motivation in N2548 was reducing the use of intmax_t in library interfaces, the new version does have the advantage of being able to specify arbitrary integer widths for e.g. assigning the result to a _BitInt, as well as being able to indicate an error case in-band with a NaN return.) As with other such changes from interfaces introduced in TS 18661, implement the new types as a replacement for the old ones, with the old functions remaining as compat symbols but not supported as an API. The test generator used for many of the tests is updated to handle both versions of the functions. Tested for x86_64 and x86, and with build-many-glibcs.py. Also tested tgmath tests for x86_64 with GCC 7 to make sure that the modified case for older compilers in <tgmath.h> does work. Also tested for powerpc64le to cover the ldbl-128ibm implementation and the other things that are handled differently for that configuration. The new tests fail for ibm128, but all the failures relate to incorrect signs of zero results and turn out to arise from bugs in the underlying roundl, ceill, truncl and floorl implementations that I've reported in bug 33623, rather than indicating any bug in the actual new implementation of the functions for that format. So given fixes for those functions (which shouldn't be hard, and of course should add to the tests for those functions rather than relying only on indirect testing via fromfp), the fromfp tests should start passing for ibm128 as well.
2025-11-10math: Sync acosh from CORE-MATHAdhemerval Zanella2-0/+70
The c9abdf80 fix handle some cases for RNDZ. Checked on x86_64-linux-gnu.
2025-11-05math: Remove the SVID error handling from tgammafAdhemerval Zanella2-4/+6
It improves latency for about 1.5% and throughput for about 2-4%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2025-11-05math: Remove the SVID error handling from lgammaf/lgammaf_rAdhemerval Zanella5-12/+19
It improves latency throughput for about 2%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2025-11-05math: Remove the SVID error handling from atan2fAdhemerval Zanella2-3/+9
It improves latency for about 3-6% and throughput for about 5-12%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2025-11-04Rename fromfp files in preparation for changing types for C23Joseph Myers6-20/+20
As discussed in bug 28327, the fromfp functions changed type in C23 (compared to the version in TS 18661-1); they now return the same type as the floating-point argument, instead of intmax_t / uintmax_t. As with other such incompatible changes compared to the initial TS 18661 versions of interfaces (the types of totalorder functions, in particular), it seems appropriate to support only the new version as an API, not the old one (although many programs written for the old API might in fact work wtih the new one as well). Thus, the existing implementations should become compat symbols. They are sufficiently different from how I'd expect to implement the new version that using separate implementations in separate files is more convenient than trying to share code, and directly sharing testcases would be problematic as well. Rename the existing fromfp implementation and test files to names reflecting how they're intended to become compat symbols, so freeing up the existing filenames for a subsequent implementation of the C23 versions of these functions (which is the point at which the existing implementations would actually become compat symbols). gen-fromfp-tests.py and gen-fromfp-tests-inputs are not renamed; I think it will make sense to adapt the test generator to be able to generate most tests for both versions of the functions (with extra test inputs added that are only of interest with the C23 version). The ldbl-opt/nldbl-* files are also not renamed; since those are for a static only library, no compat versions are needed, and they'll just have their contents changed when the C23 version is implemented. Tested for x86_64, and with build-many-glibcs.py.
2025-11-04Add C23 long_double_t, _FloatN_tJoseph Myers2-8/+357
C23 Annex H adds <math.h> typedefs long_double_t and _FloatN_t (originally introduced in TS 18661-3), analogous to float_t and double_t. Add these typedefs to glibc. (There are no _FloatNx_t typedefs.) C23 also slightly changes the rules for how such typedef names should be defined, compared to the definition in TS 18661-3. In both cases, <TYPE>_t corresponds to the evaluation format for <TYPE>, as specified by FLT_EVAL_METHOD (for which <math.h> uses glibc's internal __GLIBC_FLT_EVAL_METHOD). Specifically, each FLT_EVAL_METHOD value corresponds to some type U (for example, 64 corresponds to U = _Float64), and for types with exactly the same set of values as U, TS 18661-3 says expressions with those types are to be evaluated to the range and precision of type U (so <TYPE>_t is defined to U), whereas C23 only does that for types whose values are a strict subset of those of type U (so <TYPE>_t is defined to <TYPE>). As with other cases where semantics changed between TS 18661 and C23, this patch only implements the newer version of the semantics (including adjusting existing definitions of float_t and double_t as needed). The new semantics are contradictory between the main standard and Annex H for the case of FLT_EVAL_METHOD == 2 and the choice of double_t when double and long double have the same values (the main standard says it's defined as long double in that case, whereas Annex H would define it as double), which I've raised on the WG14 reflector (but I think setting FLT_EVAL_METHOD == 2 when double and long double have the same values is a fairly theoretical combination of features); for now glibc follows the value in the main standard in that case. Note that I think all existing GCC targets supported by glibc only use values -1, 0, 1, 2 or 16 for FLT_EVAL_METHOD (so most of the header code is somewhat theoretical, though potentially relevant with other compilers since the choice of FLT_EVAL_METHOD is only an API choice, not an ABI one; it can vary with compiler options, and these typedefs should not be used in ABIs). The testcase (expanded to cover the new typedefs) is really just repeating the same logic in a second place (so all it really tests is that __GLIBC_FLT_EVAL_METHOD is consistent with FLT_EVAL_METHOD). Tested for x86_64 and x86, and with build-many-glibcs.py.
2025-11-04math: Remove the SVID error handling wrapper from sqrtAdhemerval Zanella2-4/+5
i386 and m68k architectures should use math-use-builtins-sqrt.h rather than relying on architecture-specific or inline assembly implementations. The PowerPC optimization for PPC 601/603 (30 years old) is removed. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2025-11-04math: Remove the SVID error handling from sinhfAdhemerval Zanella2-3/+9
It improves latency for about 3-10% and throughput for about 5-15%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2025-11-04math: Remove the SVID error handling from remainderAdhemerval Zanella2-6/+16
The optimized i386 version is faster than the generic one, and gcc implements it through the builtin. This optimization enables us to migrate the implementation to a C version. The performance on a Zen3 chip is similar to the SVID one. The m68k provided an optimized version through __m81_u(remainderf) (mathimpl.h), and gcc does not implement it through a builtin (different than i386). Performance improves a bit on x86_64 (Zen3, gcc 15.2.1): reciprocal-throughput input master NO-SVID improvement x86_64 subnormals 18.8522 16.2506 13.80% x86_64 normal 421.8260 403.9270 4.24% x86_64 close-exponent 21.0579 18.7642 10.89% i686 subnormals 21.3443 21.4229 -0.37% i686 normal 525.8380 538.807 -2.47% i686 close-exponent 21.6589 21.7983 -0.64% Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2025-11-04math: Remove the SVID error handling from remainderfAdhemerval Zanella2-4/+6
The optimized i386 version is faster than the generic one, and gcc implements it through the builtin. This optimization enables us to migrate the implementation to a C version. The performance on a Zen3 chip is similar to the SVID one. The m68k provided an optimized version through __m81_u(remainderf) (mathimpl.h), and gcc does not implement it through a builtin (different than i386). Performance improves a bit on x86_64 (Zen3, gcc 15.2.1): reciprocal-throughput input master NO-SVID improvement x86_64 subnormals 17.5349 15.6125 10.96% x86_64 normal 53.8134 52.5754 2.30% x86_64 close-exponent 20.0211 18.6656 6.77% i686 subnormals 21.8105 20.1856 7.45% i686 normal 73.1945 71.2199 2.70% i686 close-exponent 22.2141 20.331 8.48% Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2025-10-31math: Remove xfail from pow test [BZ #33563]Wilco Dijkstra2-67/+71
Remove xfail from pow testcase since pow and powf have been fixed. Also check float128 maximum value. See BZ #33563. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2025-10-30math: Remove the SVID error handling wrapper from yn/jnAdhemerval Zanella2-5/+7
Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2025-10-30math: Remove the SVID error handling wrapper from y1/j1Adhemerval Zanella2-5/+7
Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2025-10-30math: Remove the SVID error handling wrapper from y0/j0Adhemerval Zanella2-5/+8
Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2025-10-30math: Remove the SVID error handling from coshfAdhemerval Zanella2-3/+10
It improves latency for about 3-10% and throughput for about 5-15%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2025-10-30math: Remove the SVID error handling from atanhfAdhemerval Zanella2-3/+9
It improves latency for about 1-10% and throughput for about 5-10%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2025-10-30math: Remove the SVID error handling from acoshfAdhemerval Zanella2-3/+4
It improves latency for about 3-7% and throughput for about 5-10%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2025-10-30math: Remove the SVID error handling from asinfAdhemerval Zanella3-3/+16
It improves latency for about 2% and throughput for about 5%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2025-10-30math: Remove the SVID error handling from acosfAdhemerval Zanella2-3/+8
It improves latency for about 2-10% and throughput for about 5-10%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2025-10-30math: Remove the SVID error handling from log10fAdhemerval Zanella2-3/+11
It improves latency for about 3-10% and throughput for about 5-10%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2025-10-27math: Consolidate erf/erfc definitionsAdhemerval Zanella1-0/+1
The common code definitions are consolidated in s_erf_common.h and s_erf_common.c. Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>
2025-10-27math: Consolidate internal erf/erfc tablesAdhemerval Zanella1-0/+2
The shared internal data definitions are consolidated in s_erf_data.c and the erfc only one are moved to s_erfc_data.c. Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>
2025-10-27math: Use erfc from CORE-MATHAdhemerval Zanella2-0/+72
The current implementation precision shows the following accuracy, on three ranges ([-DBL_MAX,5], [-5,5], [5,DBL_MAX]) with 10e9 uniform randomly generated numbers for each range (first column is the accuracy in ULP, with '0' being correctly rounded, second is the number of samples with the corresponding precision): * Range [-DBL_MAX, -5] * FE_TONEAREST 0: 10000000000 100.00% * FE_UPWARD 0: 10000000000 100.00% * FE_DOWNWARD 0: 10000000000 100.00% * FE_TOWARDZERO 0: 10000000000 100.00% * Range [-5, 5] * FE_TONEAREST 0: 8069309665 80.69% 1: 1882910247 18.83% 2: 47485296 0.47% 3: 293749 0.00% 4: 1043 0.00% * FE_UPWARD 0: 5540301026 55.40% 1: 2026739127 20.27% 2: 1774882486 17.75% 3: 567324466 5.67% 4: 86913847 0.87% 5: 3820789 0.04% 6: 18259 0.00% * FE_DOWNWARD 0: 5520969586 55.21% 1: 2057293099 20.57% 2: 1778334818 17.78% 3: 557521494 5.58% 4: 82473927 0.82% 5: 3393276 0.03% 6: 13800 0.00% * FE_TOWARDZERO 0: 6220287175 62.20% 1: 2323846149 23.24% 2: 1251999920 12.52% 3: 190748245 1.91% 4: 12996232 0.13% 5: 122279 0.00% * Range [5, DBL_MAX] * FE_TONEAREST 0: 10000000000 100.00% * FE_UPWARD 0: 10000000000 100.00% * FE_DOWNWARD 0: 10000000000 100.00% * FE_TOWARDZERO 0: 10000000000 100.00% The CORE-MATH implementation is correctly rounded for any rounding mode. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows: reciprocal-throughput master patched improvement x86_64 49.0980 267.0660 -443.94% x86_64v2 49.3220 257.6310 -422.34% x86_64v3 42.9539 84.9571 -97.79% aarch64 28.7266 52.9096 -84.18% power10 14.1673 25.1273 -77.36% Latency master patched improvement x86_64 95.6640 269.7060 -181.93% x86_64v2 95.8296 260.4860 -171.82% x86_64v3 91.1658 112.7150 -23.64% aarch64 37.0745 58.6791 -58.27% power10 23.3197 31.5737 -35.39% Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>
2025-10-27math: Use erf from CORE-MATHAdhemerval Zanella2-0/+141
The current implementation precision shows the following accuracy, on three rangeis ([-DBL_MIN, -4.2], [-4.2, 4.2], [4.2, DBL_MAX]) with 10e9 uniform randomly generated numbers for each range (first column is the accuracy in ULP, with '0' being correctly rounded, second is the number of samples with the corresponding precision): * Range [-DBL_MIN, -4.2] * FE_TONEAREST 0: 10000000000 100.00% * FE_UPWARD 0: 10000000000 100.00% * FE_DOWNWARD 0: 10000000000 100.00% * FE_TOWARDZERO 0: 10000000000 100.00% * Range [-4.2, 4.2] * FE_TONEAREST 0: 9764404513 97.64% 1: 235595487 2.36% * FE_UPWARD 0: 9468013928 94.68% 1: 531986072 5.32% * FE_DOWNWARD 0: 9493787693 94.94% 1: 506212307 5.06% * FE_TOWARDZERO 0: 9585271351 95.85% 1: 414728649 4.15% * Range [4.2, DBL_MAX] * FE_TONEAREST 0: 10000000000 100.00% * FE_UPWARD 0: 10000000000 100.00% * FE_DOWNWARD 0: 10000000000 100.00% * FE_TOWARDZERO 0: 10000000000 100.00% The CORE-MATH implementation is correctly rounded for any rounding mode. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows: reciprocal-throughput master patched improvement x86_64 38.2754 78.0311 -103.87% x86_64v2 38.3325 75.7555 -97.63% x86_64v3 34.6604 28.3182 18.30% aarch64 23.1499 21.4307 7.43% power10 12.3051 9.3766 23.80% Latency master patched improvement x86_64 84.3062 121.3580 -43.95% x86_64v2 84.1817 117.4250 -39.49% x86_64v3 81.0933 70.6458 12.88% aarch64 35.012 29.5012 15.74% power10 21.7205 18.4589 15.02% For x86_64/x86_64-v2, most performance hit came from the fma call through the ifunc mechanism. Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>
2025-10-27math: Use tgamma from CORE-MATHAdhemerval Zanella1-1/+2
The current implementation precision shows the following accuracy, on one range ([-20,20]) with 10e9 uniform randomly generated numbers for each range (first column is the accuracy in ULP, with '0' being correctly rounded, second is the number of samples with the corresponding precision): * Range [-20,20] * FE_TONEAREST 0: 4504877808 45.05% 1: 4402224940 44.02% 2: 947652295 9.48% 3: 131076831 1.31% 4: 13222216 0.13% 5: 910045 0.01% 6: 35253 0.00% 7: 606 0.00% 8: 6 0.00% * FE_UPWARD 0: 3477307921 34.77% 1: 4838637866 48.39% 2: 1413942684 14.14% 3: 240762564 2.41% 4: 27113094 0.27% 5: 2130934 0.02% 6: 102599 0.00% 7: 2324 0.00% 8: 14 0.00% * FE_DOWNWARD 0: 3923545410 39.24% 1: 4745067290 47.45% 2: 1137899814 11.38% 3: 171596912 1.72% 4: 20013805 0.20% 5: 1773899 0.02% 6: 99911 0.00% 7: 2928 0.00% 8: 31 0.00% * FE_TOWARDZERO 0: 3697160741 36.97% 1: 4731951491 47.32% 2: 1303092738 13.03% 3: 231969191 2.32% 4: 32344517 0.32% 5: 3283092 0.03% 6: 193010 0.00% 7: 5175 0.00% 8: 45 0.00% The CORE-MATH implementation is correctly rounded for any rounding mode. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows: reciprocal-throughput master patched improvement x86_64 237.7960 175.4090 26.24% x86_64v2 232.9320 163.4460 29.83% x86_64v3 193.0680 89.7721 53.50% aarch64 113.6340 56.7350 50.07% power10 92.0617 26.6137 71.09% Latency master patched improvement x86_64 266.7190 208.0130 22.01% x86_64v2 263.6070 200.0280 24.12% x86_64v3 214.0260 146.5180 31.54% aarch64 114.4760 58.5235 48.88% power10 84.3718 35.7473 57.63% Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>
2025-10-27math: Use lgamma from CORE-MATHAdhemerval Zanella1-2/+4
The current implementation precision shows the following accuracy, on one range ([-1,1]) with 10e9 uniform randomly generated numbers for each range (first column is the accuracy in ULP, with '0' being correctly rounded, second is the number of samples with the corresponding precision): * Range [-20, 20] * FE_TONEAREST 0: 6701254075 67.01% 1: 3230897408 32.31% 2: 63986940 0.64% 3: 3605417 0.04% 4: 233189 0.00% 5: 20973 0.00% 6: 1869 0.00% 7: 125 0.00% 8: 4 0.00% * FE_UPWARDA 0: 4207428861 42.07% 1: 5001137116 50.01% 2: 740542213 7.41% 3: 49116304 0.49% 4: 1715617 0.02% 5: 54464 0.00% 6: 4956 0.00% 7: 451 0.00% 8: 16 0.00% 9: 2 0.00% * FE_DOWNWARD 0: 4155925193 41.56% 1: 4989821364 49.90% 2: 770312796 7.70% 3: 72014726 0.72% 4: 11040522 0.11% 5: 872811 0.01% 6: 12480 0.00% 7: 106 0.00% 8: 2 0.00% * FE_TOWARDZERO 0: 4225861532 42.26% 1: 5027051105 50.27% 2: 706443411 7.06% 3: 39877908 0.40% 4: 713109 0.01% 5: 47513 0.00% 6: 4961 0.00% 7: 438 0.00% 8: 23 0.00% * Range [20, 0x5.d53649e2d4674p+1012] * FE_TONEAREST 0: 7262241995 72.62% 1: 2737758005 27.38% * FE_UPWARD 0: 4690392401 46.90% 1: 5143728216 51.44% 2: 165879383 1.66% * FE_DOWNWARD 0: 4690333331 46.90% 1: 5143794937 51.44% 2: 165871732 1.66% * FE_TOWARDZERO 0: 4690343071 46.90% 1: 5143786761 51.44% 2: 165870168 1.66% The CORE-MATH implementation is correctly rounded for any rounding mode. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows: reciprocal-throughput master patched improvement x86_64 112.9740 135.8640 -20.26% x86_64v2 111.8910 131.7590 -17.76% x86_64v3 108.2800 68.0935 37.11% aarch64 61.3759 49.2403 19.77% power10 42.4483 24.1943 43.00% Latency master patched improvement x86_64 144.0090 167.9750 -16.64% x86_64v2 139.2690 167.1900 -20.05% x86_64v3 130.1320 96.9347 25.51% aarch64 66.8538 53.2747 20.31% power10 49.5076 29.6917 40.03% For x86_64/x86_64-v2, most performance hit came from the fma call through the ifunc mechanism. Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>
2025-10-27math: Move atanh internal data to separate fileAdhemerval Zanella1-0/+1
The internal data definitions are moved to s_atanh_data.c. It helps on ABIs that build the implementation multiple times for ifunc optimizations, like x86_64. Reviewed-by: DJ Delorie <dj@redhat.com>
2025-10-27math: Consolidate acosh and asinh internal tableAdhemerval Zanella1-0/+1
The shared internal data definitions are consolidated in s_asincosh_data.c. Reviewed-by: DJ Delorie <dj@redhat.com>
2025-10-22various fixes detected with -Wdouble-promotionPaul Zimmermann3-3/+3
Changes with respect to v1: - added comment in e_j1f.c to explain the use of float is enough Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2025-10-21Simplify powl computation for small integral y [BZ #33411]Siddhesh Poyarekar2-0/+68
The powl implementation for x86_64 ends up multiplying X once more than necessary and then throwing away that result. This results in an overflow flag being set in cases where there is no overflow. Simplify the relevant portion by special casing the -3 to 3 range and simply multiplying repetitively. Resolves: BZ #33411 Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
2025-10-21math: Fix compare sort function on compoundnAdhemerval Zanella1-2/+2
To use the fabs function to the used type, instead of the double variant. it fixes a build issue with clang: ./s_compoundn_template.c:64:14: error: absolute value function 'fabs' given an argument of type 'const long double' but has parameter of type 'double' which may cause truncation of value [-Werror,-Wabsolute-value] 64 | FLOAT pd = fabs (*(const FLOAT *) p); | ^ ./s_compoundn_template.c:64:14: note: use function 'fabsl' instead 64 | FLOAT pd = fabs (*(const FLOAT *) p); | ^~~~ | fabsl Reviewed-by: Collin Funk <collin.funk1@gmail.com>
2025-10-21math: Suppress more aliases builtin type conflictsAdhemerval Zanella1-3/+36
Reviewed-by: Sam James <sam@gentoo.org>
2025-10-21math: Suppress clang -Wabsolute-value warning on math_check_force_underflowAdhemerval Zanella1-1/+10
clang warns: ../sysdeps/x86/fpu/powl_helper.c:233:3: error: absolute value function '__builtin_fabsf' given an argument of type 'typeof (res)' (aka 'long double') but has parameter of type 'float' which may cause truncation of value [-Werror,-Wabsolute-value] math_check_force_underflow (res); ^ ./math-underflow.h:45:11: note: expanded from macro 'math_check_force_underflow' if (fabs_tg (force_underflow_tmp) \ ^ ./math-underflow.h:27:20: note: expanded from macro 'fabs_tg' #define fabs_tg(x) __MATH_TG ((x), (__typeof (x)) __builtin_fabs, (x)) ^ ../math/math.h:899:16: note: expanded from macro '__MATH_TG' float: FUNC ## f ARGS, \ ^ <scratch space>:73:1: note: expanded from here __builtin_fabsf ^ Due the use of _Generic from TG_MATH. Reviewed-by: Sam James <sam@gentoo.org>
2025-10-14math: Use binary search on lgammaf slow pathAdhemerval Zanella2-0/+78
And remove some unused entries of the fallback table. Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2025-10-14math: Optimize fma call on log2pf1Adhemerval Zanella2-0/+26
The fma is required only for x == -0x1.da285cp-5 in FE_TONEAREST to provide correctly rounded results. Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2025-10-14math: Optimize fma call on asinpifAdhemerval Zanella2-0/+52
The fma is required only for x == +/-0x1.6371e8p-4f in FE_TOWARDZERO to provide correctly rounded results. Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2025-10-14math: Update auto-libm-test-out-log2p1Adhemerval Zanella1-0/+1416
The 079728391084 did not update log2p1 output with the newer values.
2025-09-27AArch64: Implement AdvSIMD and SVE log10p1(f) routinesLuna Lamb1-1/+1
Vector variants of the new C23 log10p1 routines. Note: Benchmark inputs for log10p1(f) are identical to log1p(f) Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2025-09-27AArch64: Implement AdvSIMD and SVE log2p1(f) routinesLuna Lamb1-1/+1
Vector variants of the new C23 log2p1 routines. Note: Benchmark inputs for log2p1(f) are identical to log1p(f). Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2025-09-11math: Add fetestexcept internal aliasAdhemerval Zanella1-1/+3
To avoid linknamespace issues on old standards. It is required if the fallback fma implementation is used if/when it is also used internally for other implementation. Reviewed-by: DJ Delorie <dj@redhat.com>
2025-09-11math: Add feclearexcept internal aliasAdhemerval Zanella1-0/+1
To avoid linknamespace issues on old standards. It is required if the fallback fma implementation is used if/when it is also used internally for other implementation. Reviewed-by: DJ Delorie <dj@redhat.com>
2025-09-02AArch64: Implement exp2m1 and exp10m1 routinesHasaan Khan1-2/+2
Vector variants of the new C23 exp2m1 & exp10m1 routines. Note: Benchmark inputs for exp2m1 & exp10m1 are identical to exp2 & exp10 respectively, this also includes the floating point variations. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>