aboutsummaryrefslogtreecommitdiff
path: root/sysdeps
AgeCommit message (Collapse)AuthorFilesLines
2024-12-05x86-64: Update libm-test-ulpsH.J. Lu1-5/+5
Update x86-64 libm-test-ulps to fix FAIL: math/test-float64x-cospi FAIL: math/test-float64x-exp2m1 FAIL: math/test-float64x-sinpi FAIL: math/test-ldouble-cospi FAIL: math/test-ldouble-exp2m1 FAIL: math/test-ldouble-sinpi when building glibc with GCC 7.4. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-12-04Implement C23 sinpiJoseph Myers38-0/+285
C23 adds various <math.h> function families originally defined in TS 18661-4. Add the sinpi functions (sin(pi*x)). Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-12-04Implement C23 cospiJoseph Myers38-0/+287
C23 adds various <math.h> function families originally defined in TS 18661-4. Add the cospi functions (cos(pi*x)). Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-12-04malloc: Optimize small memory clearing for callocH.J. Lu1-0/+49
Add calloc-clear-memory.h to clear memory size up to 36 bytes (72 bytes on 64-bit targets) for calloc. Use repeated stores with 1 branch, instead of up to 3 branches. On x86-64, it is faster than memset since calling memset needs 1 indirect branch, 1 broadcast, and up to 4 branches. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2024-12-02elf: Consolidate stackinfo.hAdhemerval Zanella7-171/+13
And use sane default the generic implementation. Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-11-29Add threaded test of sem_trywaitJoseph Myers2-0/+78
All the existing glibc tests of sem_trywait are single-threaded. Add one that calls sem_trywait and sem_post in separate threads. Tested for x86_64.
2024-11-29nptl: Add new test for pthread_spin_trylockSergey Kolosov2-0/+83
Add a threaded test for pthread_spin_trylock attempting to lock already acquired spin lock and checking for correct return code. Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-11-29AArch64: Remove zva_128 from memsetWilco Dijkstra1-24/+1
Remove ZVA 128 support from memset - the new memset no longer guarantees count >= 256, which can result in underflow and a crash if ZVA size is 128 ([1]). Since only one CPU uses a ZVA size of 128 and its memcpy implementation was removed in commit e162ab2bf1b82c40f29e1925986582fa07568ce8, remove this special case too. [1] https://sourceware.org/pipermail/libc-alpha/2024-November/161626.html Reviewed-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-11-28Remove nios2-linux-gnuAdhemerval Zanella110-8528/+113
GCC 15 (e876acab6cdd84bb2b32c98fc69fb0ba29c81153) and binutils (e7a16d9fd65098045ef5959bf98d990f12314111) both removed all Nios II support, and the architecture has been EOL'ed by the vendor. The kernel still has support, but without a proper compiler there is no much sense in keep it on glibc. Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-11-26math: Add internal roundeven_finiteAdhemerval Zanella5-3/+35
Some CORE-MATH routines uses roundeven and most of ISA do not have an specific instruction for the operation. In this case, the call will be routed to generic implementation. However, if the ISA does support round() and ctz() there is a better alternative (as used by CORE-MATH). This patch adds such optimization and also enables it on powerpc. On a power10 it shows the following improvement: expm1f master patched improvement latency 9.8574 7.0139 28.85% reciprocal-throughput 4.3742 2.6592 39.21% Checked on powerpc64le-linux-gnu and aarch64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-25RISC-V: Use builtin for fma and fmafJulian Zhu3-68/+4
The built-in functions `builtin_{fma, fmaf}` are sufficient to generate correct `fmadd.d`/`fmadd.s` instructions on RISC-V. Signed-off-by: Julian Zhu <jz531210@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-11-25RISC-V: Use builtin for copysign and copysignfJulian Zhu2-58/+0
The built-in functions `builtin_{copysign, copysignf}` are sufficient to generate correct `fsgnj.d/fsgnj.s` instructions on RISC-V. Signed-off-by: Julian Zhu <jz531210@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-11-25Silence most -Wzero-as-null-pointer-constant diagnosticsAlejandro Colomar10-15/+15
Replace 0 by NULL and {0} by {}. Omit a few cases that aren't so trivial to fix. Link: <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117059> Link: <https://software.codidact.com/posts/292718/292759#answer-292759> Signed-off-by: Alejandro Colomar <alx@kernel.org>
2024-11-25sysdeps: linux: Fix output of LD_SHOW_AUXV=1 for AT_RSEQ_*Yannick Le Pennec1-0/+2
The constants themselves were added to elf.h back in 8754a4133e but the array in _dl_show_auxv wasn't modified accordingly, resulting in the following output when running LD_SHOW_AUXV=1 /bin/true on recent Linux: AT_??? (0x1b): 0x1c AT_??? (0x1c): 0x20 With this patch: AT_RSEQ_FEATURE_SIZE: 28 AT_RSEQ_ALIGN: 32 Tested on Linux 6.11 x86_64 Signed-off-by: Yannick Le Pennec <yannick.lepennec@live.fr> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-11-25nptl: initialize cpu_id_start prior to rseq registrationMichael Jeanson1-0/+1
When adding explicit initialization of rseq fields prior to registration, I glossed over the fact that 'cpu_id_start' is also documented as initialized by user-space. While current kernels don't validate the content of this field on registration, future ones could. Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2024-11-25math: Fix branch hint for 68d7128942Adhemerval Zanella1-1/+1
2024-11-25powerpc64le: ROP Changes for strncpy/ppc-mountSachin Monga4-28/+46
Add ROP protect instructions to strncpy and ppc-mount functions. Modify FRAME_MIN_SIZE to 48 bytes for ELFv2 to reserve additional 16 bytes for ROP save slot and padding. Signed-off-by: Sachin Monga <smonga@linux.ibm.com> Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2024-11-25math: Fix non-portability in the computation of signgam in lgammafVincent Lefevre1-5/+4
The k>>31 in signgam = 1 - (((k&(k>>31))&1)<<1); is not portable: * The ISO C standard says "If E1 has a signed type and a negative value, the resulting value is implementation-defined." (this is still in C23). * If the int type is larger than 32 bits (e.g. a 64-bit type), then k = INT_MAX; line 144 will make k>>31 put 1 in bit 0 (thus signgam will be -1) while 0 is expected. Moreover, instead of the fx >= 0x1p31f condition, testing fx >= 0 is probably better for 2 reasons: The signgam expression has more or less a condition on the sign of fx (the goal of k>>31, which can be dropped with this new condition). Since fx ≥ 0 should be the most common case, one can get signgam directly in this case (value 1). And this simplifies the expression for the other case (fx < 0). This new condition may be easier/faster to test on the processor (e.g. by avoiding a load of a constant from the memory). This is commit d41459c731865516318f813cf4c966dafa0eecbf from CORE-MATH. Checked on x86_64-linux-gnu.
2024-11-25hurd: Add MAP_NORESERVE mmap flagSamuel Thibault1-0/+1
This is already the current default behavior, which we will change with overcommit support addition.
2024-11-22Add multithreaded test of sem_getvalueJoseph Myers2-0/+186
Test coverage of sem_getvalue is fairly limited. Add a test that runs it on threads on each CPU. For this purpose I adapted tst-skeleton-thread-affinity.c; it didn't seem very suitable to use as-is or include directly in a different test doing things per-CPU, but did seem a suitable starting point (thus sharing tst-skeleton-affinity.c) for such testing. Tested for x86_64.
2024-11-22math: Use tanf from CORE-MATHAdhemerval Zanella28-270/+315
The CORE-MATH implementation is correctly rounded (for any rounding mode) and shows better performance to the generic tanf. The code was adapted to glibc style, to use the definition of math_config.h, to remove errno handling, and to use a generic 128 bit routine for ABIs that do not support it natively. Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (neoverse1, gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1): latency master patched improvement x86_64 82.3961 54.8052 33.49% x86_64v2 82.3415 54.8052 33.44% x86_64v3 69.3661 50.4864 27.22% i686 219.271 45.5396 79.23% aarch64 29.2127 19.1951 34.29% power10 19.5060 16.2760 16.56% reciprocal-throughput master patched improvement x86_64 28.3976 19.7334 30.51% x86_64v2 28.4568 19.7334 30.65% x86_64v3 21.1815 16.1811 23.61% i686 105.016 15.1426 85.58% aarch64 18.1573 10.7681 40.70% power10 8.7207 8.7097 0.13% Signed-off-by: Alexei Sibidanov <sibid@uvic.ca> Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr> Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-22math: Use lgammaf from CORE-MATHAdhemerval Zanella29-601/+349
The CORE-MATH implementation is correctly rounded (for any rounding mode) and shows better performance to the generic lgammaf. The code was adapted to glibc style, to use the definition of math_config.h, to remove errno handling, to use math_narrow_eval on overflow usage, and to adapt to make it reentrant. Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1, gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1): latency master patched improvement x86_64 86.5609 70.3278 18.75% x86_64v2 78.3030 69.9709 10.64% x86_64v3 74.7470 59.8457 19.94% i686 387.355 229.761 40.68% aarch64 40.8341 33.7563 17.33% power10 26.5520 16.1672 39.11% powerpc 28.3145 17.0625 39.74% reciprocal-throughput master patched improvement x86_64 68.0461 48.3098 29.00% x86_64v2 55.3256 47.2476 14.60% x86_64v3 52.3015 38.9028 25.62% i686 340.848 195.707 42.58% aarch64 36.8000 30.5234 17.06% power10 20.4043 12.6268 38.12% powerpc 22.6588 13.8866 38.71% Signed-off-by: Alexei Sibidanov <sibid@uvic.ca> Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr> Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-22math: Use erfcf from CORE-MATHAdhemerval Zanella26-249/+174
The CORE-MATH implementation is correctly rounded (for any rounding mode) and shows better performance to the generic erfcf. The code was adapted to glibc style and to use the definition of math_config.h. Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1, gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1): latency master patched improvement x86_64 98.8796 66.2142 33.04% x86_64v2 98.9617 67.4221 31.87% x86_64v3 87.4161 53.1754 39.17% aarch64 33.8336 22.0781 34.75% power10 21.1750 13.5864 35.84% powerpc 21.4694 13.8149 35.65% reciprocal-throughput master patched improvement x86_64 48.5620 27.6731 43.01% x86_64v2 47.9497 28.3804 40.81% x86_64v3 42.0255 18.1355 56.85% aarch64 24.3938 13.4041 45.05% power10 10.4919 6.1881 41.02% powerpc 11.763 6.76468 42.49% Signed-off-by: Alexei Sibidanov <sibid@uvic.ca> Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr> Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-22math: Use erff from CORE-MATHAdhemerval Zanella27-235/+247
The CORE-MATH implementation is correctly rounded (for any rounding mode) and shows better performance to the generic erff. The code was adapted to glibc style and to use the definition of math_config.h. Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1, gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1): latency master patched improvement x86_64 85.7363 45.1372 47.35% x86_64v2 86.6337 38.5816 55.47% x86_64v3 71.3810 34.0843 52.25% i686 190.143 97.5014 48.72% aarch64 34.9091 14.9320 57.23% power10 38.6160 8.5188 77.94% powerpc 39.7446 8.45781 78.72% reciprocal-throughput master patched improvement x86_64 35.1739 14.7603 58.04% x86_64v2 34.5976 11.2283 67.55% x86_64v3 27.3260 9.8550 63.94% i686 91.0282 30.8840 66.07% aarch64 22.5831 6.9615 69.17% power10 18.0386 3.0918 82.86% powerpc 20.7277 3.63396 82.47% Signed-off-by: Alexei Sibidanov <sibid@uvic.ca> Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr> Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-22math: Split s_erfF in erff and erfcAdhemerval Zanella7-78/+178
So we can eventually replace each implementation. Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-22math: Use cbrtf from CORE-MATHAdhemerval Zanella25-134/+87
The CORE-MATH implementation is correctly rounded (for any rounding mode) and shows better performance to the generic cbrtf. The code was adapted to glibc style and to use the definition of math_config.h. Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1, gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1): latency master patched improvement x86_64 68.6348 36.8908 46.25% x86_64v2 67.3418 36.6968 45.51% x86_64v3 63.4981 32.7859 48.37% aarch64 29.3172 12.1496 58.56% power10 18.0845 8.8893 50.85% powerpc 18.0859 8.79527 51.37% reciprocal-throughput master patched improvement x86_64 36.4369 13.3565 63.34% x86_64v2 37.3611 13.1149 64.90% x86_64v3 31.6024 11.2102 64.53% aarch64 18.6866 7.3474 60.68% power10 9.4758 3.6329 61.66% powerpc 9.58896 3.90439 59.28% Signed-off-by: Alexei Sibidanov <sibid@uvic.ca> Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr> Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-11-21x86/string: Use `movsl` instead of `movsd` in strncat [BZ #32344]Siddhesh Poyarekar1-2/+2
The previous patch missed strncat, so fixed that. Resolves: BZ #32344 Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2024-11-21aarch64: Remove non-temporal load/stores from oryon-1's memsetAndrew Pinski1-26/+0
The hardware architects have a new recommendation not to use non-temporal load/stores for memset. This patch removes this path. I found there was no difference in the memset speed with/without non-temporal load/stores either. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-11-21aarch64: Remove non-temporal load/stores from oryon-1's memcpyAndrew Pinski1-40/+0
The hardware architects have a new recommendation not to use non-temporal load/stores for memcpy. This patch removes this path. I found there was no difference in the memcpy speed with/without non-temporal load/stores either. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-11-20powerpc64le: _init/_fini file changes for ROPSachin Monga3-1/+14
The ROP instructions were added in ISA 3.1 (ie, Power10), however they were defined so that if executed on older cpus, they would behave as nops. This allows us to emit them on older cpus and they'd just be ignored, but if run on a Power10, then the binary would be ROP protected. Hash instructions use negative offsets so the default position of ROP pointer is FRAME_ROP_SAVE from caller's SP. Modified FRAME_MIN_SIZE_PARM to 112 for ELFv2 to reserve additional 16 bytes for ROP save slot and padding. Signed-off-by: Sachin Monga <smonga@linux.ibm.com> Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2024-11-20AArch64: Add support for memory protection keysYury Khrustalev8-11/+477
This patch adds support for memory protection keys on AArch64 systems with enabled Stage 1 permission overlays feature introduced in Armv8.9 / 9.4 (FEAT_S1POE) [1]. 1. Internal functions "pkey_read" and "pkey_write" to access data associated with memory protection keys. 2. Implementation of API functions "pkey_get" and "pkey_set" for the AArch64 target. 3. AArch64-specific PKEY flags for READ and EXECUTE (see below). 4. New target-specific test that checks behaviour of pkeys on AArch64 targets. 5. This patch also extends existing generic test for pkeys. 6. HWCAP constant for Permission Overlay Extension feature. To support more accurate mapping of underlying permissions to the PKEY flags, we introduce additional AArch64-specific flags. The full list of flags is: - PKEY_UNRESTRICTED: 0x0 (for completeness) - PKEY_DISABLE_ACCESS: 0x1 (existing flag) - PKEY_DISABLE_WRITE: 0x2 (existing flag) - PKEY_DISABLE_EXECUTE: 0x4 (new flag, AArch64 specific) - PKEY_DISABLE_READ: 0x8 (new flag, AArch64 specific) The problem here is that PKEY_DISABLE_ACCESS has unusual semantics as it overlaps with existing PKEY_DISABLE_WRITE and new PKEY_DISABLE_READ. For this reason mapping between permission bits RWX and "restrictions" bits awxr (a for disable access, etc) becomes complicated: - PKEY_DISABLE_ACCESS disables both R and W - PKEY_DISABLE_{WRITE,READ} disables W and R respectively - PKEY_DISABLE_EXECUTE disables X Combinations like the one below are accepted although they are redundant: - PKEY_DISABLE_ACCESS | PKEY_DISABLE_READ | PKEY_DISABLE_WRITE Reverse mapping tries to retain backward compatibility and ORs PKEY_DISABLE_ACCESS whenever both flags PKEY_DISABLE_READ and PKEY_DISABLE_WRITE would be present. This will break code that compares pkey_get output with == instead of using bitwise operations. The latter is more correct since PKEY_* constants are essentially bit flags. It should be noted that PKEY_DISABLE_ACCESS does not prevent execution. [1] https://developer.arm.com/documentation/ddi0487/ka/ section D8.4.1.4 Co-authored-by: Szabolcs Nagy <szabolcs.nagy@arm.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-11-20AArch64: Remove thunderx{,2} memcpyAndrew Pinski7-792/+0
ThunderX1 and ThunderX2 have been retired for a few years now. So let's remove the thunderx{,2} specific versions of memcpy. The performance gain or them was for medium and large sizes while the generic (aarch64) memcpy will handle just slightly worse. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2024-11-19Fix femode_t conditionals for arc and or1kJoseph Myers2-2/+2
Two of the architecture bits/fenv.h headers define femode_t if __GLIBC_USE (IEC_60559_BFP_EXT), instead of the correct condition __GLIBC_USE (IEC_60559_BFP_EXT_C23) (both were added after commit 0175c9e9be5f0b2000859666b6e1ef3696f1123b, but were probably first developed before it and then not updated to take account of its changes). This results in failures of the installed headers check for fenv.h when building with GCC 15 (defaults to -std=gnu23 - we don't yet have an installed-headers test specifically for C23 mode and don't yet require a compiler with such a mode for building glibc) together with a combination of options leaving C23 features enabled, since the declarations of functions using femode_t use the correct conditions; see <https://sourceware.org/pipermail/libc-testresults/2024q4/013163.html>. Fix the conditionals to get <fenv.h> to work correctly in C23 mode again. Tested with build-many-glibcs.py (arc-linux-gnu, arch-linux-gnuhf, or1k-linux-gnu-hard, or1k-linux-gnu-soft).
2024-11-19powerpc64le: Optimized strcat for POWER10Mahesh Bodapati4-9/+56
This patch adds an optimized strcat which makes use of the default strcat function which calls the Power10 strcpy and strlen routines.
2024-11-19powerpc: Improve the inline asm for syscall wrappersPeter Bergner1-20/+22
Update the inline asm syscall wrappers to match the newer register constraint usage in INTERNAL_VSYSCALL_CALL_TYPE. Use the faster mfocrf instruction when available, rather than the slower mfcr microcoded instruction.
2024-11-19htl: move pthread_attr_init into libc.gfleury6-5/+10
Signed-off-by: gfleury <gfleury@disroot.org>
2024-11-19htl: move pthread_attr_setguardsize into libc.gfleury5-3/+11
Signed-off-by: gfleury <gfleury@disroot.org>
2024-11-19htl: move pthread_attr_setschedparam into libc.gfleury5-8/+6
Signed-off-by: gfleury <gfleury@disroot.org>
2024-11-19htl: move pthread_attr_setscope into libc.gfleury5-5/+6
Signed-off-by: gfleury <gfleury@disroot.org>
2024-11-19htl: move pthread_attr_setstackaddr into libc.gfleury7-7/+24
Signed-off-by: gfleury <gfleury@disroot.org>
2024-11-19htl: move pthread_attr_setstacksize into libc.gfleury6-3/+12
Signed-off-by: gfleury <gfleury@disroot.org>
2024-11-19htl: move pthread_attr_getstack into libc.gfleury6-3/+12
Signed-off-by: gfleury <gfleury@disroot.org>
2024-11-19htl: move pthread_attr_getstackaddr into libc.gfleury6-3/+12
Signed-off-by: gfleury <gfleury@disroot.org>
2024-11-19htl move pthread_attr_getstacksize into libc.gfleury6-3/+12
Signed-off-by: gfleury <gfleury@disroot.org>
2024-11-19htl move pthread_attr_getscope into libc.gfleury5-5/+6
Signed-off-by: gfleury <gfleury@disroot.org>
2024-11-19htl move pthread_attr_getguardsize into libc.gfleury5-3/+11
Signed-off-by: gfleury <gfleury@disroot.org>
2024-11-19htl: move __pthread_default_attr into libcgfleury1-0/+1
Signed-off-by: gfleury <gfleury@disroot.org>
2024-11-19htl: move pthread_attr_destroy into libc.gfleury5-5/+7
Signed-off-by: gfleury <gfleury@disroot.org>
2024-11-13x86/string: Use `movsl` instead of `movsd` in strncpy/strncat [BZ #32344]Noah Goldstein1-1/+1
`ld`, starting at 2.40, emits a warning when using `movsd`. There is no change to the actual code produced. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-11-12hppa: Update libm-test-ulpsJohn David Anglin1-0/+3
Update imaginary part of csin. Signed-off-by: John David Anglin <dave.anglin@bell.net>