Age | Commit message (Collapse) | Author | Files | Lines |
|
tgmath3-macro-tests won't compile with <float.h> and <tgmath.h> from
Clang due to missing C23 support:
https://github.com/llvm/llvm-project/issues/97335
Disable them for now when Clang is used for testing so that "make check"
can finish.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
Since -mamx-tile is used only for testing, use LIBC_TRY_TEST_CC_COMMAND,
instead of LIBC_TRY_CC_AND_TEST_CC_COMMAND to check it and don't check
__builtin_ia32_ldtilecfg for Clang.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
|
|
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
Fix assert.c so that even the fallback
case conforms to POSIX, although not exactly the same as
the default case so a test can tell the difference.
Add a test that verifies that abort is called, and that the
message printed to stderr has all the info that POSIX requires.
Verify this even when malloc isn't usable.
Reviewed-by: Paul Eggert <eggert@cs.ucla.edu>
|
|
After
commit 215447f5cbcf1a494cded57734f68d7f9c2b0dc0
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Tue Dec 17 06:18:55 2024 +0800
cet: Pass -mshstk to compiler for tst-cet-legacy-10a[-static].c
we can remove '#pragma GCC target' in tst-cet-legacy-10a[-static].c.
Co-Authored-By: H.J. Lu <hjl.tools@gmail.com>
|
|
POSIX states that "if a child process cannot be created, or if the
termination status for the command language interpreter cannot be
obtained, system() shall return -1 and set errno to indicate the error."
In the glibc implementation it could happen when posix_spawn fails,
which happens when the underlying fork, vfork, or clone call fails. They
could fail with EAGAIN and ENOMEM.
Resolves: BZ #32450
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Clang has its own <tgmath.h> and doesn't use <tgmath.h> from glibc. Pass
"-I." to compiler only if $($(<F)-no-include-dot) are undefined. Define
it to yes for tgmath tests when testing with Clang.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
|
|
Clang 19 takes a very long time, it ran more than 27 minutes on Intel Core
i7-1195G7 before the process was killed, to compile bug28.c:
https://github.com/llvm/llvm-project/issues/120462
Exclude it when Clang is used for testing.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
|
|
Also use is_rtld_link_map in dl-cet.c. This fixes BZ #32488.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
On powerpc math/test-ibm128-tanpi shows multiple failures:
testing long double (without inline functions)
Failure: tanpi_downward (0xfffffffffffffffdp-1): Exception "Divide by zero" not set
Failure: tanpi_downward (0xfffffffffffffffdp-1): errno set to 0, expected 34 (ERANGE)
Failure: Test: tanpi_downward (0xfffffffffffffffdp-1)
Result:
is: 4.68843873182857939141363635204365e+28 0x1.2efbb6629d1d59b032520400df8p+95
should be: inf inf
Failure: tanpi_downward (0x3fffffffffffffffffffffffffdp-1): Exception "Divide by zero" not set
Failure: tanpi_downward (0x3fffffffffffffffffffffffffdp-1): errno set to 0, expected 34 (ERANGE)
Failure: Test: tanpi_downward (0x3fffffffffffffffffffffffffdp-1)
Result:
is: 1.41444453325831960404472183124793e+16 0x1.9202627cbf98e052d5fdbeee1f8p+53
should be: inf inf
Failure: tanpi_downward (-0xf.ffffffffffffbffffffffffffcp+1020): Exception "Invalid operation" set
Failure: tanpi_downward (-0xf.ffffffffffffbffffffffffffcp+1020): Exception "Overflow" set
Failure: tanpi_downward (-0xf.ffffffffffffbffffffffffffcp+1020): errno set to 33, expected 0 (unchanged)
Failure: Test: tanpi_downward (-0xf.ffffffffffffbffffffffffffcp+1020)
Result:
is: qNaN
should be: -0.00000000000000000000000000000000e+00 -0x0.000000000000000000000000000p+0
Failure: Test: tanpi_downward (0x3.fffffffffffffffcp+108)
Result:
is: 2.91356019227449116879287504834896e-15 0x1.a3e365fee24d4632f95a2235698p-49
should be: 0.00000000000000000000000000000000e+00 0x0.000000000000000000000000000p+0
difference: 2.91356019227449116879287504834896e-15 0x1.a3e365fee24d4632f95a2235698p-49
ulp : 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321
max.ulp : 8.0000
Failure: Test: tanpi_downward (0x3.ffffffffffffffffffffffffffp+108)
Result:
is: 7.94911926685664643005642781870827e-16 0x1.ca3c4b83eb5688e1474146dc338p-51
should be: 0.00000000000000000000000000000000e+00 0x0.000000000000000000000000000p+0
difference: 7.94911926685664643005642781870827e-16 0x1.ca3c4b83eb5688e1474146dc338p-51
ulp : 160891965142034222272327839154722485473479235229008379884749401713481320342777314570400076204240982703218835644458374555276642
max.ulp : 8.0000
Failure: tanpi_towardzero (0xfffffffffffffffdp-1): Exception "Divide by zero" not set
Failure: tanpi_towardzero (0xfffffffffffffffdp-1): errno set to 0, expected 34 (ERANGE)
Failure: Test: tanpi_towardzero (0xfffffffffffffffdp-1)
Result:
is: 2.14718475310122677917055904836884e+28 0x1.1584624c14882fff76592b4ec10p+94
should be: inf inf
Failure: tanpi_towardzero (-0xfffffffffffffffdp-1): Exception "Divide by zero" not set
Failure: tanpi_towardzero (-0xfffffffffffffffdp-1): errno set to 0, expected 34 (ERANGE)
Failure: Test: tanpi_towardzero (-0xfffffffffffffffdp-1)
Result:
is: -2.14718475310122677917055904836884e+28 -0x1.1584624c14882fff76592b4ec10p+94
should be: -inf -inf
Failure: tanpi_towardzero (0x3fffffffffffffffffffffffffdp-1): Exception "Divide by zero" not set
Failure: tanpi_towardzero (0x3fffffffffffffffffffffffffdp-1): errno set to 0, expected 34 (ERANGE)
Failure: Test: tanpi_towardzero (0x3fffffffffffffffffffffffffdp-1)
Result:
is: 6.60739946234609289593176521179840e+15 0x1.7796511d79d6ce55bc8bf083fe0p+52
should be: inf inf
Failure: tanpi_towardzero (-0x3fffffffffffffffffffffffffdp-1): Exception "Divide by zero" not set
Failure: tanpi_towardzero (-0x3fffffffffffffffffffffffffdp-1): errno set to 0, expected 34 (ERANGE)
Failure: Test: tanpi_towardzero (-0x3fffffffffffffffffffffffffdp-1)
Result:
is: -6.60739946234609289593176521179840e+15 -0x1.7796511d79d6ce55bc8bf083fe0p+52
should be: -inf -inf
Failure: Test: tanpi_towardzero (-0x3.fffffffffffffffcp+108)
Result:
is: -1.17953443892757434921819283936141e-14 -0x1.a8f8d97fb893518cbe5688935c0p-47
should be: -0.00000000000000000000000000000000e+00 -0x0.000000000000000000000000000p+0
difference: 1.17953443892757434921819283936141e-14 0x1.a8f8d97fb893518cbe5688935c0p-47
ulp : 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321
max.ulp : 8.0000
Failure: Test: tanpi_towardzero (-0x3.ffffffffffffffffffffffffffp+108)
Result:
is: -1.85584803206881692897837494734542e-14 -0x1.4e51e25c1f5ab4470a3a0a42c24p-46
should be: -0.00000000000000000000000000000000e+00 -0x0.000000000000000000000000000p+0
difference: 1.85584803206881692897837494734542e-14 0x1.4e51e25c1f5ab4470a3a0a42c24p-46
ulp : 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321
max.ulp : 8.0000
Failure: Test: tanpi_towardzero (0x3.fffffffffffffffcp+108)
Result:
is: 1.17953443892757434921819283936141e-14 0x1.a8f8d97fb893518cbe5688935c0p-47
should be: 0.00000000000000000000000000000000e+00 0x0.000000000000000000000000000p+0
difference: 1.17953443892757434921819283936141e-14 0x1.a8f8d97fb893518cbe5688935c0p-47
ulp : 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321
max.ulp : 8.0000
Failure: Test: tanpi_towardzero (0x3.ffffffffffffffffffffffffffp+108)
Result:
is: 1.85584803206881692897837494734542e-14 0x1.4e51e25c1f5ab4470a3a0a42c24p-46
should be: 0.00000000000000000000000000000000e+00 0x0.000000000000000000000000000p+0
difference: 1.85584803206881692897837494734542e-14 0x1.4e51e25c1f5ab4470a3a0a42c24p-46
ulp : 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321
max.ulp : 8.0000
Failure: tanpi_upward (-0xfffffffffffffffdp-1): Exception "Divide by zero" not set
Failure: tanpi_upward (-0xfffffffffffffffdp-1): errno set to 0, expected 34 (ERANGE)
Failure: Test: tanpi_upward (-0xfffffffffffffffdp-1)
Result:
is: -2.14718475310122677917055904836884e+28 -0x1.1584624c14882fff76592b4ec10p+94
should be: -inf -inf
Failure: tanpi_upward (-0x3fffffffffffffffffffffffffdp-1): Exception "Divide by zero" not set
Failure: tanpi_upward (-0x3fffffffffffffffffffffffffdp-1): errno set to 0, expected 34 (ERANGE)
Failure: Test: tanpi_upward (-0x3fffffffffffffffffffffffffdp-1)
Result:
is: -6.60739946234609289593176521179829e+15 -0x1.7796511d79d6ce55bc8bf083fdbp+52
should be: -inf -inf
Failure: Test: tanpi_upward (-0x3.fffffffffffffffcp+108)
Result:
is: -1.17953443892757434921819283936138e-14 -0x1.a8f8d97fb893518cbe5688935b0p-47
should be: -0.00000000000000000000000000000000e+00 -0x0.000000000000000000000000000p+0
difference: 1.17953443892757434921819283936139e-14 0x1.a8f8d97fb893518cbe5688935b0p-47
ulp : inf
max.ulp : 8.0000
Failure: Test: tanpi_upward (-0x3.ffffffffffffffffffffffffffp+108)
Result:
is: -1.85584803206881692897837494734542e-14 -0x1.4e51e25c1f5ab4470a3a0a42c24p-46
should be: -0.00000000000000000000000000000000e+00 -0x0.000000000000000000000000000p+0
difference: 1.85584803206881692897837494734543e-14 0x1.4e51e25c1f5ab4470a3a0a42c24p-46
ulp : inf
max.ulp : 8.0000
Failure: tanpi_upward (0xf.ffffffffffffbffffffffffffcp+1020): Exception "Invalid operation" set
Failure: tanpi_upward (0xf.ffffffffffffbffffffffffffcp+1020): Exception "Overflow" set
Failure: tanpi_upward (0xf.ffffffffffffbffffffffffffcp+1020): errno set to 33, expected 0 (unchanged)
Failure: Test: tanpi_upward (0xf.ffffffffffffbffffffffffffcp+1020)
Result:
is: qNaN
should be: 0.00000000000000000000000000000000e+00 0x0.000000000000000000000000000p+0
|
|
This was discovered after extending elf/tst-audit23 to cover
dlclose of the dlmopen namespace.
Auditors already experience the new order during process
shutdown (_dl_fini), so no LAV_CURRENT bump or backwards
compatibility code seems necessary.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Previously, the ld.so link map was silently added to the namespace.
This change produces an auditing event for it.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
And include <stdbool.h> for a definition of bool.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
After commit 1d5024f4f052c12e404d42d3b5bfe9c3e9fd27c4
("support: Build with exceptions and asynchronous unwind tables
[BZ #30587]"), libgcc_s is expected to show up in the DSO
list on 32-bit Arm. Do not update max_objs because vdso is not
tracked (and which is the reason why the test currently passes
even with libgcc_s present).
Also write the log output from the auditor to standard output,
for easier test debugging.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
This avoids immediate GLIBC_PRIVATE ABI issues if the size of
struct link_map or struct auditstate changes.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Unconditionally define it to false for static builds.
This avoids the awkward use of weak_extern for _dl_rtld_map
in checks that cannot be possibly true on static builds.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Linux 6.12 adds a new constant F_CREATED_QUERY. Add it to glibc's
bits/fcntl-linux.h.
Tested for x86_64.
|
|
Add the new Linux 6.12 HWCAP_LOONGARCH_LSPW to the corresponding
bits/hwcap.h.
Tested with build-many-glibcs.py for loongarch64-linux-gnu-lp64d.
|
|
Linux 6.12 adds a constant MSG_SOCK_DEVMEM (recall that various
constants such as this one are defined in the non-uapi linux/socket.h
but still form part of the kernel/userspace interface, so that
non-uapi header is one that needs checking each release for new such
constants). Add it to glibc's bits/socket.h.
Tested for x86_64.
|
|
As seen on an Intel i9-9900K CPU, with glibc built with GCC 11.5,
configured with and without --disable-multi-arch.
|
|
As seen with an AMD 7950X CPU, on a glibc built with GCC 11.5.
|
|
Results from running on Neoverse-V2, built with GCC 11.5.
|
|
Neither NPTL nor Hurd define this macro anymore.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
This matches kernel behavior. With this change, it is possible
to use utimensat as a replacement for the futimens interface,
similar to what glibc does internally.
Reviewed-by: Paul Eggert <eggert@cs.ucla.edu>
|
|
This padding is difficult to use for preserving the internal
GLIBC_PRIVATE ABI. The comment is misleading. Current Address
Sanitizer uses heuristics to determine struct pthread size.
It does not depend on its precise layout. It merely scans for
pointers allocated using malloc.
Due to the removal of the padding, the assert for its start
is no longer required.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
The current DSO dependency sorting tests are for a limited number of
specific cases, including some from particular bug reports.
Add tests that systematically cover all possible DAGs for an
executable and the shared libraries it depends on, directly or
indirectly, up to four objects (an executable and three shared
libraries). (For this kind of DAG - ones with a single source vertex
from which all others are reachable, and an ordering on the edges from
each vertex - there are 57 DAGs on four vertices, 3399 on five
vertices and 1026944 on six vertices; see
https://arxiv.org/pdf/2303.14710 for more details on this enumeration.
I've tested that the 3399 cases with five vertices do all pass if
enabled.)
These tests are replicating the sorting logic from the dynamic linker
(thereby, for example, asserting that it doesn't accidentally change);
I'm not claiming that the logic in the dynamic linker is in some
abstract sense optimal. Note that these tests do illustrate how in
some cases the two sorting algorithms produce different results for a
DAG (I think all the existing tests for such differences are ones
involving cycles, and the motivation for the new algorithm was also to
improve the handling of cycles):
tst-dso-ordering-all4-44: a->[bc];{}->[cba]
output(glibc.rtld.dynamic_sort=1): c>b>a>{}<a<b<c
output(glibc.rtld.dynamic_sort=2): b>c>a>{}<a<c<b
They also illustrate that sometimes the sorting algorithms do not
follow the order in which dependencies are listed in DT_NEEDED even
though there is a valid topological sort that does follow that, which
might be counterintuitive considering that the DT_NEEDED ordering is
followed in the simplest cases:
tst-dso-ordering-all4-56: {}->[abc]
output: c>b>a>{}<a<b<c
shows such a simple case following DT_NEEDED order for destructor
execution (the reverse of it for constructor execution), but
tst-dso-ordering-all4-41: a->[cb];{}->[cba]
output: c>b>a>{}<a<b<c
shows that c and b are in the opposite order to what might be expected
from the simplest case, though there is no dependency requiring such
an opposite order to be used.
(I'm not asserting that either of those things is a problem, simply
observing them as less obvious properties of the sorting algorithms
shown up by these tests.)
Tested for x86_64.
|
|
Linux 6.12 adds new ELF note types NT_X86_XSAVE_LAYOUT and NT_ARM_POE.
Add these to glibc's elf.h.
Tested for x86_64.
|
|
Linux 6.12 adds the SCHED_EXT constant. Add it to glibc's
bits/sched.h and update the kernel version in tst-sched-consts.py.
Tested for x86_64.
|
|
This change implements vfork.S for direct support of the vfork
syscall. clone.S is revised to correct child support for the
vfork case.
The main bug was creating a frame prior to the clone syscall.
This was done to allow the rp and r4 registers to be saved and
restored from the stack frame. r4 was used to save and restore
the PIC register, r19, across the system call and the call to
set errno. But in the vfork case, it is undefined behavior
for the child to return from the function in which vfork was
called. It is surprising that this usually worked.
Syscalls on hppa save and restore rp and r19, so we don't need
to create a frame prior to the clone syscall. We only need a
frame when __syscall_error is called. We also don't need to
save and restore r19 around the call to $$dyncall as r19 is not
used in the code after $$dyncall.
This considerably simplifies clone.S.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
|
|
There are no new constants covered by tst-mman-consts.py,
tst-mount-consts.py or tst-pidfd-consts.py in Linux 6.12 that need any
header changes, so update the kernel version in those tests.
(tst-sched-consts.py will need updating separately along with adding
SCHED_EXT.)
Tested with build-many-glibcs.py.
|
|
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic tanhf.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).
Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):
Latency master patched improvement
x86_64 51.5273 41.0951 20.25%
x86_64v2 47.7021 39.1526 17.92%
x86_64v3 45.0373 34.2737 23.90%
i686 133.9970 83.8596 37.42%
aarch64 (Neoverse) 21.5439 14.7961 31.32%
power10 13.3301 8.4406 36.68%
reciprocal-throughput master patched improvement
x86_64 24.9493 12.8547 48.48%
x86_64v2 20.7051 12.7761 38.29%
x86_64v3 19.2492 11.0851 42.41%
i686 78.6498 29.8211 62.08%
aarch64 (Neoverse) 11.6026 7.11487 38.68%
power10 6.3328 2.8746 54.61%
Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic sinhf.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).
Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):
Latency master patched improvement
x86_64 52.6819 49.1489 6.71%
x86_64v2 49.1162 42.9447 12.57%
x86_64v3 46.9732 39.9157 15.02%
i686 141.1470 129.6410 8.15%
aarch64 (Neoverse) 20.8539 17.1288 17.86%
power10 14.5258 9.1906 36.73%
reciprocal-throughput master patched improvement
x86_64 27.5553 23.9395 13.12%
x86_64v2 21.6423 20.3219 6.10%
x86_64v3 21.4842 16.0224 25.42%
i686 87.9709 86.1626 2.06%
aarch64 (Neoverse) 15.1919 12.2744 19.20%
power10 7.2188 5.2611 27.12%
Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
The CORE-MATH implementation is correctly rounded (for any rounding mode),
although it should worse performance than current one. The current
implementation performance comes mainly from the internal usage of
the optimize expf implementation, and shows a maximum ULPs of 2 for
FE_TONEAREST and 3 for other rounding modes.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).
Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):
Latency master patched improvement
x86_64 40.6995 49.0737 -20.58%
x86_64v2 40.5841 44.3604 -9.30%
x86_64v3 39.3879 39.7502 -0.92%
i686 112.3380 129.8570 -15.59%
aarch64 (Neoverse) 18.6914 17.0946 8.54%
power10 11.1343 9.3245 16.25%
reciprocal-throughput master patched improvement
x86_64 18.6471 24.1077 -29.28%
x86_64v2 17.7501 20.2946 -14.34%
x86_64v3 17.8262 17.1877 3.58%
i686 64.1454 86.5645 -34.95%
aarch64 (Neoverse) 9.77226 12.2314 -25.16%
power10 4.0200 5.3316 -32.63%
Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic atanhf.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).
Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):
Latency master patched improvement
x86_64 59.4930 45.8568 22.92%
x86_64v2 59.5705 45.5804 23.48%
x86_64v3 53.1838 37.7155 29.08%
i686 169.354 133.5940 21.12%
aarch64 (Neoverse) 26.0781 16.9829 34.88%
power10 15.6591 10.7623 31.27%
reciprocal-throughput master patched improvement
x86_64 23.5903 18.5766 21.25%
x86_64v2 22.6489 18.2683 19.34%
x86_64v3 19.0401 13.9474 26.75%
i686 97.6034 107.3260 -9.96%
aarch64 (Neoverse) 15.3664 9.57846 37.67%
power10 6.8877 4.6242 32.86%
Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic atan2f.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).
Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):
Latency master patched improvement
x86_64 68.1175 69.2014 -1.59%
x86_64v2 66.9884 66.0081 1.46%
x86_64v3 57.7034 61.6407 -6.82%
i686 189.8690 152.7560 19.55%
aarch64 (Neoverse) 32.6151 24.5382 24.76%
power10 21.7282 17.1896 20.89%
reciprocal-throughput master patched improvement
x86_64 34.5202 31.6155 8.41%
x86_64v2 32.6379 30.3372 7.05%
x86_64v3 34.3677 23.6455 31.20%
i686 157.7290 75.8308 51.92%
aarch64 (Neoverse) 27.7788 16.2671 41.44%
power10 15.5715 8.1588 47.60%
Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic atanf.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).
Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):
Latency master patched improvement
x86_64 56.8265 53.6842 5.53%
x86_64v2 54.8177 53.6842 2.07%
x86_64v3 46.2915 48.7034 -5.21%
i686 158.3760 108.9560 31.20%
aarch64 (Neoverse) 21.687 20.5893 5.06%
power10 13.1903 13.5012 -2.36%
reciprocal-throughput master patched improvement
x86_64 16.6787 16.7601 -0.49%
x86_64v2 16.6983 16.7601 -0.37%
x86_64v3 16.2268 12.1391 25.19%
i686 138.6840 36.0640 74.00%
aarch64 (Neoverse) 11.8012 10.3565 12.24%
power10 5.3212 4.2894 19.39%
Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic asinhf.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).
Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):
Latency master patched improvement
x86_64 64.5128 56.9717 11.69%
x86_64v2 63.3065 57.2666 9.54%
x86_64v3 62.8719 51.4170 18.22%
i686 189.1630 137.635 27.24%
aarch64 (Neoverse) 25.3551 20.5757 18.85%
power10 17.9712 13.3302 25.82%
reciprocal-throughput master patched improvement
x86_64 20.0844 15.4731 22.96%
x86_64v2 19.2919 15.4000 20.17%
x86_64v3 18.7226 11.9009 36.44%
i686 103.7670 80.2681 22.65%
aarch64 (Neoverse) 12.5005 8.68969 30.49%
power10 7.2220 5.03617 30.27%
Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>:
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic asinf.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).
Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):
Latency master patched improvement
x86_64 42.8237 35.2460 17.70%
x86_64v2 43.3711 35.9406 17.13%
x86_64v3 35.0335 30.5744 12.73%
i686 213.8780 104.4710 51.15%
aarch64 (Neoverse) 17.2937 13.6025 21.34%
power10 12.0227 7.4241 38.25%
reciprocal-throughput master patched improvement
x86_64 13.6770 15.5231 -13.50%
x86_64v2 13.8722 16.0446 -15.66%
x86_64v3 13.6211 13.2753 2.54%
i686 186.7670 45.4388 75.67%
aarch64 (Neoverse) 9.96089 9.39285 5.70%
power10 4.9862 3.7819 24.15%
Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic acoshf.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).
Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):
Latency master patched improvement
x86_64 61.2471 58.7742 4.04%
x86_64-v2 62.6519 59.0523 5.75%
x86_64-v3 58.7408 50.1393 14.64%
aarch64 24.8580 21.3317 14.19%
power10 17.0469 13.1345 22.95%
reciprocal-throughput master patched improvement
x86_64 16.1618 15.1864 6.04%
x86_64-v2 15.7729 14.7563 6.45%
x86_64-v3 14.1669 11.9568 15.60%
aarch64 10.911 9.5486 12.49%
power10 6.38196 5.06734 20.60%
Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic acosf.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).
Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):
Latency master patched improvement
x86_64 52.5098 36.6312 30.24%
x86_64v2 53.0217 37.3091 29.63%
x86_64v3 42.8501 32.3977 24.39%
i686 207.3960 109.4000 47.25%
aarch64 21.3694 13.7871 35.48%
power10 14.5542 7.2891 49.92%
reciprocal-throughput master patched improvement
x86_64 14.1487 15.9508 -12.74%
x86_64v2 14.3293 16.1899 -12.98%
x86_64v3 13.6563 12.6161 7.62%
i686 158.4060 45.7354 71.13%
aarch64 12.5515 9.19233 26.76%
power10 5.7868 3.3487 42.13%
Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
The pi defined constants are not the expected value for carg
on non-default rounding modes (similar to atan). Instead use
autogenerated value.
|
|
The pi defined constants are not the expected value for atan2
on non-default rounding modes. Instead use the autogenerated value.
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
The M_PI_2 (lit_pi_2_d) constant is not the expected value for atanf
on non-default rounding modes. Instead use the autogenerated value.
|
|
For some correctly rounded inputs where infinity might generate
a number (like atanf), comparing to a pre-defined constant does not
yield the expected result in all rounding modes.
The most straightforward way to handle it would be to get the expected
result from mpfr, where it handles all the rounding modes.
|
|
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
Random inputs in the range [-10,10].
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
Random inputs in the range [-10,10].
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
Random inputs in the range [-10,10].
Reviewed-by: DJ Delorie <dj@redhat.com>
|