aboutsummaryrefslogtreecommitdiff
path: root/sysdeps
AgeCommit message (Collapse)AuthorFilesLines
2018-10-17x86: Use _rdtsc intrinsic for HP_TIMING_NOWH.J. Lu7-55/+65
Since _rdtsc intrinsic is supported in GCC 4.9, we can use it for HP_TIMING_NOW. This patch 1. Create x86 hp-timing.h to replace i686 and x86_64 hp-timing.h. 2. Move MINIMUM_ISA from init-arch.h to isa.h so that x86 hp-timing.h can check minimum x86 ISA to decide if _rdtsc can be used. NB: Checking if __i686__ isn't sufficient since __i686__ may not be defined when building for i686 class processors. * sysdeps/i386/init-arch.h: Removed. * sysdeps/i386/i586/init-arch.h: Likewise. * sysdeps/i386/i686/init-arch.h: Likewise. * sysdeps/i386/i686/hp-timing.h: Likewise. * sysdeps/x86_64/hp-timing.h: Likewise. * sysdeps/i386/isa.h: New file. * sysdeps/i386/i586/isa.h: Likewise. * sysdeps/i386/i686/isa.h: Likewise. * sysdeps/x86_64/isa.h: Likewise. * sysdeps/x86/hp-timing.h: New file. * sysdeps/x86/init-arch.h: Include <isa.h>.
2018-10-17Use single bits/shm.h for all architectures.Joseph Myers13-552/+215
After my patch to move SHMLBA to its own header, the bits/shm.h headers for architectures using the Linux kernel still vary in a few ways: the use of __syscall_ulong_t; whether padding for 32-bit systems is present before or after time fields, or missing altogether (mips, x32); whether shm_segsz is before or after the time fields; whether, if after time fields, there is extra padding before shm_segsz. This patch arranges for a single header to be used. __syscall_ulong_t is safe to use everywhere, while bits/shm-pad.h is added with new macros __SHM_PAD_AFTER_TIME, __SHM_PAD_BEFORE_TIME, __SHM_SEGSZ_AFTER_TIME and __SHM_PAD_BETWEEN_TIME_AND_SEGSZ to describe the differences. Tested for x86_64 and x86, and with build-many-glibcs.py. * sysdeps/unix/sysv/linux/Makefile (sysdep_headers): Add bits/shm-pad.h. * sysdeps/unix/sysv/linux/bits/shm.h: Include <bits/shm-pad.h>. (shmatt_t): Define as __syscall_ulong_t. (__SHM_PAD_TIME): New macro, depending on [__SHM_PAD_BEFORE_TIME] and [__SHM_PAD_AFTER_TIME]. (struct shmid_ds): Define time fields using __SHM_PAD_TIME. Define shm_segsz and associated padding based on [__SHM_SEGSZ_AFTER_TIME] and [__SHM_PAD_BETWEEN_TIME_AND_SEGSZ]. Use __syscall_ulong_t instead of unsigned long int. [__USE_MISC] (struct shminfo): Use __syscall_ulong_t instead of unsigned long int. [__USE_MISC] (struct shm_info): Likewise. * sysdeps/unix/sysv/linux/bits/shm-pad.h: New file. * sysdeps/unix/sysv/linux/hppa/bits/shm-pad.h: Likewise. * sysdeps/unix/sysv/linux/mips/bits/shm-pad.h: Likewise. * sysdeps/unix/sysv/linux/powerpc/bits/shm-pad.h: Likewise. * sysdeps/unix/sysv/linux/sparc/bits/shm-pad.h: Likewise. * sysdeps/unix/sysv/linux/x86/bits/shm-pad.h: Likewise. * sysdeps/unix/sysv/linux/hppa/bits/shm.h: Remove. * sysdeps/unix/sysv/linux/mips/bits/shm.h: Likewise. * sysdeps/unix/sysv/linux/powerpc/bits/shm.h: Likewise. * sysdeps/unix/sysv/linux/sparc/bits/shm.h: Likewise. * sysdeps/unix/sysv/linux/x86/bits/shm.h: Likewise.
2018-10-17Move SHMLBA to its own header.Joseph Myers17-325/+190
One difference between bits/shm.h headers for architectures using the Linux kernel is the definition of SHMLBA. This was noted in <https://sourceware.org/ml/libc-alpha/2018-09/msg00175.html> as a reason why even a new architecture (C-SKY) might need its own bits/shm.h; thus, splitting it out of bits/shm.h can allow less duplication of headers for new architectures. This patch moves that definition to its own header, bits/shmlba.h, to allow more sharing of headers between architectures. That move allows the arm, ia64 and sh variants of bits/shm.h to be removed, as they had no other significant differences from the generic bits/shm.h; powerpc and x86 have their own bits/shm.h but do not need to get their own bits/shmlba.h because they use the same SHMLBA as the generic header. Other architectures with their own bits/shm.h get their own bits/shmlba.h without being able to remove their own bits/shm.h until the generic one has been adapted to be able to handle more architectures (where, in addition to the differences seen for bits/msq.h and bits/sem.h, the position of shm_segsz in struct shmid_ds also depends on the architecture). Tested for x86_64 and x86, and with build-many-glibcs.py. * sysdeps/unix/sysv/linux/Makefile (sysdep_headers): Add bits/shmlba.h. * sysdeps/unix/sysv/linux/bits/shm.h: Include <bits/shmlba.h>. (SHMLBA): Remove macro. (__getpagesize): Remove function declaration. * sysdeps/unix/sysv/linux/hppa/bits/shm.h: Include <bits/shmlba.h>. (SHMLBA): Remove macro. * sysdeps/unix/sysv/linux/mips/bits/shm.h: Include <bits/shmlba.h>. (SHMLBA): Remove macro. * sysdeps/unix/sysv/linux/powerpc/bits/shm.h: Include <bits/shmlba.h>. (SHMLBA): Remove macro. (__getpagesize): Remove function declaration. * sysdeps/unix/sysv/linux/sparc/bits/shm.h: Include <bits/shmlba.h>. (SHMLBA): Remove macro. (__getshmlba): Remove function declaration. * sysdeps/unix/sysv/linux/x86/bits/shm.h: Include <bits/shmlba.h>. (SHMLBA): Remove macro. (__getpagesize): Remove function declaration. * sysdeps/unix/sysv/linux/arm/bits/shm.h: Remove file. * sysdeps/unix/sysv/linux/ia64/bits/shm.h: Likewise. * sysdeps/unix/sysv/linux/sh/bits/shm.h: Likewise. * sysdeps/unix/sysv/linux/bits/shmlba.h: New file. * sysdeps/unix/sysv/linux/arm/bits/shmlba.h: Likewise. * sysdeps/unix/sysv/linux/hppa/bits/shmlba.h: Likewise. * sysdeps/unix/sysv/linux/ia64/bits/shmlba.h: Likewise. * sysdeps/unix/sysv/linux/mips/bits/shmlba.h: Likewise. * sysdeps/unix/sysv/linux/sh/bits/shmlba.h: Likewise. * sysdeps/unix/sysv/linux/sparc/bits/shmlba.h: Likewise.
2018-10-17Fix race in pthread_mutex_lock while promoting to PTHREAD_MUTEX_ELISION_NP ↵Stefan Liebler4-13/+141
[BZ #23275] The race leads either to pthread_mutex_destroy returning EBUSY or triggering an assertion (See description in bugzilla). This patch is fixing the race by ensuring that the elision path is used in all cases if elision is enabled by the GLIBC_TUNABLES framework. The __kind variable in struct __pthread_mutex_s is accessed concurrently. Therefore we are now using the atomic macros. The new testcase tst-mutex10 is triggering the race on s390x and intel. Presumably also on power, but I don't have access to a power machine with lock-elision. At least the code for power is the same as on the other two architectures. ChangeLog: [BZ #23275] * nptl/tst-mutex10.c: New File. * nptl/Makefile (tests): Add tst-mutex10. (tst-mutex10-ENV): New variable. * sysdeps/unix/sysv/linux/s390/force-elision.h: (FORCE_ELISION): Ensure that elision path is used if elision is available. * sysdeps/unix/sysv/linux/powerpc/force-elision.h (FORCE_ELISION): Likewise. * sysdeps/unix/sysv/linux/x86/force-elision.h: (FORCE_ELISION): Likewise. * nptl/pthreadP.h (PTHREAD_MUTEX_TYPE, PTHREAD_MUTEX_TYPE_ELISION) (PTHREAD_MUTEX_PSHARED): Use atomic_load_relaxed. * nptl/pthread_mutex_consistent.c (pthread_mutex_consistent): Likewise. * nptl/pthread_mutex_getprioceiling.c (pthread_mutex_getprioceiling): Likewise. * nptl/pthread_mutex_lock.c (__pthread_mutex_lock_full) (__pthread_mutex_cond_lock_adjust): Likewise. * nptl/pthread_mutex_setprioceiling.c (pthread_mutex_setprioceiling): Likewise. * nptl/pthread_mutex_timedlock.c (__pthread_mutex_timedlock): Likewise. * nptl/pthread_mutex_trylock.c (__pthread_mutex_trylock): Likewise. * nptl/pthread_mutex_unlock.c (__pthread_mutex_unlock_full): Likewise. * sysdeps/nptl/bits/thread-shared-types.h (struct __pthread_mutex_s): Add comments. * nptl/pthread_mutex_destroy.c (__pthread_mutex_destroy): Use atomic_load_relaxed and atomic_store_relaxed. * nptl/pthread_mutex_init.c (__pthread_mutex_init): Use atomic_store_relaxed.
2018-10-17Don't reduce test timeout to less than defaultAndreas Schwab1-3/+0
This removes all overrides of TIMEOUT that are less than or equal to the default timeout.
2018-10-16Remove extra space at end of line.Steve Ellcey1-1/+1
2018-10-16aarch64: optimized memcpy implementation for thunderx2Anton Youdkevitch2-18/+603
Since aligned loads and stores are huge performance advantage the implementation always tries to do aligned access. Among the cases when src and dst addresses are aligned or unaligned evenly there are cases of not evenly unaligned src and dst. For such cases (if the length is big enough) ext instruction is used to merge-and-shift two memory chunks loaded from two adjacent aligned locations and then the adjusted chunk gets stored to aligned address. Performance gain against the current T2 implementation: memcpy-large: 65K-32M: +40% - +10% memcpy-walk: 128-32M: +20% - +2%
2018-10-15Use single bits/sem.h for all architectures.Joseph Myers13-460/+177
The bits/sem.h headers for architectures using the Linux kernel vary in a few ways: * x32 uses __syscall_ulong_t instead of unsigned long int. * The x86 header uses padding after time fields unconditionally (including for both x86_64 ABIs), not just for 32-bit time (unlike in msqid_ds where there is only padding for 32-bit time). Because this padding is present for x32, and is __syscall_ulong_t there, it does have to be __syscall_ulong_t, not unsigned long int. * The MIPS header never uses padding around time fields, even when 32-bit (unlike in msqid_ds where it has endian-dependent padding for 32-bit time). * Some older 32-bit big-endian architectures have padding before rather than after time fields, although the preferred generic approach is padding after the time fields independent of endianness. (There are also insubstantial differences such as use of unsigned int for padding instead of unsigned long int, which makes no difference to layout since the padding fields using unsigned int are only present on 32-bit architectures.) For the first, __syscall_ulong_t can be used in the generic version as it's the same as unsigned long int everywhere except x32. For the other differences, this patch adds macros __SEM_PAD_BEFORE_TIME and __SEM_PAD_AFTER_TIME in a new bits/sem-pad.h header, so that header is the only one needing to be provided on architectures with differences in this area, and everything else can go in a single common bits/sem.h header. Tested for x86_64 and x86, and with build-many-glibcs.py. * sysdeps/unix/sysv/linux/Makefile (sysdep_headers): Add bits/sem-pad.h. * sysdeps/unix/sysv/linux/bits/sem.h: Include <bits/sem-pad.h> instead of <bits/wordsize.h>. (__SEM_PAD_TIME): New macro, depending on [__SEM_PAD_BEFORE_TIME] and [__SEM_PAD_AFTER_TIME]. (struct semid_ds): Define time fields using __SEM_PAD_TIME. Use __syscall_ulong_t instead of unsigned long int. * sysdeps/unix/sysv/linux/bits/sem-pad.h: New file. * sysdeps/unix/sysv/linux/hppa/bits/sem-pad.h: Likewise. * sysdeps/unix/sysv/linux/mips/bits/sem-pad.h: Likewise. * sysdeps/unix/sysv/linux/powerpc/bits/sem-pad.h: Likewise. * sysdeps/unix/sysv/linux/sparc/bits/sem-pad.h: Likewise. * sysdeps/unix/sysv/linux/x86/bits/sem-pad.h: Likewise. * sysdeps/unix/sysv/linux/hppa/bits/sem.h: Remove. * sysdeps/unix/sysv/linux/mips/bits/sem.h: Likewise. * sysdeps/unix/sysv/linux/powerpc/bits/sem.h: Likewise. * sysdeps/unix/sysv/linux/sparc/bits/sem.h: Likewise. * sysdeps/unix/sysv/linux/x86/bits/sem.h: Likewise.
2018-10-11Use single bits/msq.h for all architectures.Joseph Myers13-449/+189
The bits/msq.h headers for architectures using the Linux kernel vary in a few ways: * x32 uses __syscall_ulong_t instead of unsigned long int. * x32 has 64-bit time_t, so no padding around time fields despite __WORDSIZE == 32. * Some older 32-bit big-endian architectures have padding before rather than after time fields, although the preferred generic approach is padding after the time fields independent of endianness. (There are also insubstantial differences such as use of unsigned int for padding instead of unsigned long int, which makes no difference to layout since the padding fields using unsigned int are only present on 32-bit architectures.) For the first, __syscall_ulong_t can be used in the generic version as it's the same as unsigned long int everywhere except x32. For the other two differences, this patch adds macros __MSQ_PAD_BEFORE_TIME and __MSQ_PAD_AFTER_TIME in a new bits/msq-pad.h header, so that header is the only one needing to be provided on architectures with differences in this area, and everything else can go in a single common bits/msq.h header. Once we have __TIMESIZE, the generic bits/msq-pad.h can change to use that instead of __WORDSIZE, at which point the x86 version of bits/msq-pad.h won't be needed either. Tested for x86_64 and x86, and with build-many-glibcs.py. * sysdeps/unix/sysv/linux/Makefile (sysdep_headers): Add bits/msq-pad.h. * sysdeps/unix/sysv/linux/bits/msq.h: Include <bits/msq-pad.h> instead of <bits/wordsize.h>. (msgqnum_t): Define as __syscall_ulong_t. (msglen_t): Likewise. (__MSQ_PAD_TIME): New macro, depending on [__MSQ_PAD_BEFORE_TIME] and [__MSQ_PAD_AFTER_TIME]. (struct msqid_ds): Define time fields using __MSQ_PAD_TIME. Use __syscall_ulong_t instead of unsigned long int. * sysdeps/unix/sysv/linux/bits/msq-pad.h: New file. * sysdeps/unix/sysv/linux/hppa/bits/msq-pad.h: Likewise. * sysdeps/unix/sysv/linux/mips/bits/msq-pad.h: Likewise. * sysdeps/unix/sysv/linux/powerpc/bits/msq-pad.h: Likewise. * sysdeps/unix/sysv/linux/sparc/bits/msq-pad.h: Likewise. * sysdeps/unix/sysv/linux/x86/bits/msq-pad.h: Likewise. * sysdeps/unix/sysv/linux/hppa/bits/msq.h: Remove. * sysdeps/unix/sysv/linux/mips/bits/msq.h: Likewise. * sysdeps/unix/sysv/linux/powerpc/bits/msq.h: Likewise. * sysdeps/unix/sysv/linux/sparc/bits/msq.h: Likewise. * sysdeps/unix/sysv/linux/x86/bits/msq.h: Likewise.
2018-10-10Use common bits/shm.h for more architectures.Joseph Myers4-324/+7
sysdeps/unix/sysv/linux/bits/shm.h has padding after time fields in struct shmid_ds unconditionally, and thus is only suitable for 32-bit architectures (no 64-bit configurations use this file); sysdeps/unix/sysv/linux/generic/bits/shm.h is substantively the same, except that the padding is conditioned on __WORDSIZE == 32, and so it can be used for 64-bit architectures as well. This patch adds the conditionals to sysdeps/unix/sysv/linux/bits/shm.h. The linux/generic/ version is then no longer needed and so is removed, as are the alpha and s390 versions which are also no longer needed. The other architecture-specific versions have different padding, layout, types or SHMLBA definitions and so are still needed after this change. This is essentially the same change for bits/shm.h as the bits/msq.h patch and the bits/sem.h patch. However, the details of the padding variations for the architectures that aren't changed are not all the same between msqid_ds, shmid_ds and semid_ds. Tested with build-many-glibcs.py. * sysdeps/unix/sysv/linux/bits/shm.h: Include <bits/wordsize.h>. (struct shmid_ds): Condition padding after time fields on [__WORDSIZE == 32]. * sysdeps/unix/sysv/linux/alpha/bits/shm.h: Remove file. * sysdeps/unix/sysv/linux/generic/bits/shm.h: Likewise. * sysdeps/unix/sysv/linux/s390/bits/shm.h: Likewise.
2018-10-10Use common bits/sem.h for more architectures.Joseph Myers5-355/+5
sysdeps/unix/sysv/linux/bits/sem.h has padding after time fields in struct semid_ds unconditionally, and thus is only suitable for 32-bit architectures (no 64-bit configurations use this file); sysdeps/unix/sysv/linux/generic/bits/sem.h is substantively the same, except that the padding is conditioned on __WORDSIZE == 32, and so it can be used for 64-bit architectures as well. This patch adds the conditionals to sysdeps/unix/sysv/linux/bits/sem.h. The linux/generic/ version is then no longer needed and so is removed, as are the alpha, ia64 and s390 versions which are also no longer needed. The other architecture-specific versions have different padding or types and so are still needed after this change. This is essentially the same change for bits/sem.h as the bits/msq.h patch. However, the details of the padding variations for the architectures that aren't changed are not all the same between msqid_ds and semid_ds. Tested with build-many-glibcs.py. * sysdeps/unix/sysv/linux/bits/sem.h: Include <bits/wordsize.h>. (struct semid_ds): Condition padding after time fields on [__WORDSIZE == 32]. * sysdeps/unix/sysv/linux/alpha/bits/sem.h: Remove file. * sysdeps/unix/sysv/linux/generic/bits/sem.h: Likewise. * sysdeps/unix/sysv/linux/ia64/bits/sem.h: Likewise. * sysdeps/unix/sysv/linux/s390/bits/sem.h: Likewise.
2018-10-10Use common bits/msq.h for more architectures.Joseph Myers5-321/+7
sysdeps/unix/sysv/linux/bits/msq.h has padding after time fields in struct msqid_ds unconditionally, and thus is only suitable for 32-bit architectures (no 64-bit configurations use this file); sysdeps/unix/sysv/linux/generic/bits/msq.h is substantively the same, except that the padding is conditioned on __WORDSIZE == 32, and so it can be used for 64-bit architectures as well. This patch adds the conditionals to sysdeps/unix/sysv/linux/bits/msq.h. The linux/generic/ version is then no longer needed and so is removed, as are the alpha, ia64 and s390 versions which are also no longer needed. The other architecture-specific versions have different padding or types and so are still needed after this change. Tested with build-many-glibcs.py. * sysdeps/unix/sysv/linux/bits/msq.h: Include <bits/wordsize.h>. (struct msqid_ds): Condition padding after time fields on [__WORDSIZE == 32]. * sysdeps/unix/sysv/linux/alpha/bits/msq.h: Remove file. * sysdeps/unix/sysv/linux/generic/bits/msq.h: Likewise. * sysdeps/unix/sysv/linux/ia64/bits/msq.h: Likewise. * sysdeps/unix/sysv/linux/s390/bits/msq.h: Likewise.
2018-10-04Use bits/mman-linux.h for hppa.Joseph Myers1-51/+27
hppa currently has a bits/mman.h that does not include bits/mman-linux.h, unlike all other architectures using the Linux kernel. This sort of variation between architectures is generally unhelpful when making global changes for new constants added to new Linux kernel releases. This patch changes hppa to use bits/mman-linux.h, overriding constants with different values as necessary (including with #undef after bits/mman.h inclusion when needed, as already done for alpha). While there could possibly be further improvements through e.g. splitting more sets of definitions into separate bits/ headers, I think this is still an improvement on the current state. diffstat shows 27 lines added, 51 deleted (and some of that is actually existing lines moving to a different place in the file). Tested with build-many-glibcs.py for hppa-linux-gnu. * sysdeps/unix/sysv/linux/hppa/bits/mman.h: Include <bits/mman-linux.h>. (PROT_READ): Don't define here. (PROT_WRITE): Likewise. (PROT_EXEC): Likewise. (PROT_NONE): Likewise. (PROT_GROWSDOWN): Likewise. (PROT_GROWSUP): Likewise. (MAP_SHARED): Likewise. (MAP_PRIVATE): Likewise. [__USE_MISC] (MAP_SHARED_VALIDATE): Likewise. [__USE_MISC] (MAP_FILE): Likewise. [__USE_MISC] (MAP_ANONYMOUS): Likewise. [__USE_MISC] (MAP_ANON): Likewise. [__USE_MISC] (MAP_HUGE_SHIFT): Likewise. [__USE_MISC] (MAP_HUGE_MASK): Likewise. (MCL_CURRENT): Likewise. (MCL_FUTURE): Likewise. (MCL_ONFAULT): Likewise. [__USE_MISC] (MADV_NORMAL): Likewise. [__USE_MISC] (MADV_RANDOM): Likewise. [__USE_MISC] (MADV_SEQUENTIAL): Likewise. [__USE_MISC] (MADV_WILLNEED): Likewise. [__USE_MISC] (MADV_DONTNEED): Likewise. [__USE_MISC] (MADV_FREE): Likewise. [__USE_MISC] (MADV_REMOVE): Likewise. [__USE_MISC] (MADV_DONTFORK): Likewise. [__USE_MISC] (MADV_DOFORK): Likewise. [__USE_MISC] (MADV_HWPOISON): Likewise. [__USE_XOPEN2K] (POSIX_MADV_NORMAL): Likewise. [__USE_XOPEN2K] (POSIX_MADV_RANDOM): Likewise. [__USE_XOPEN2K] (POSIX_MADV_SEQUENTIAL): Likewise. [__USE_XOPEN2K] (POSIX_MADV_WILLNEED): Likewise. [__USE_XOPEN2K] (POSIX_MADV_DONTNEED): Likewise. (__MAP_ANONYMOUS): New macro. [__USE_MISC] (MAP_TYPE): Undefine and redefine after <bits/mman-linux.h> inclusion. (MAP_FIXED): Likewise. (MS_SYNC): Likewise. (MS_ASYNC): Likewise. (MS_INVALIDATE): Likewise. [__USE_MISC] (MADV_MERGEABLE): Likewise. [__USE_MISC] (MADV_UNMERGEABLE): Likewise. [__USE_MISC] (MADV_HUGEPAGE): Likewise. [__USE_MISC] (MADV_NOHUGEPAGE): Likewise. [__USE_MISC] (MADV_DONTDUMP): Likewise. [__USE_MISC] (MADV_DODUMP): Likewise. [__USE_MISC] (MADV_WIPEONFORK): Likewise. [__USE_MISC] (MADV_KEEPONFORK): Likewise.
2018-10-04Fix libnldbl_nonshared.a references to internal libm symbols (bug 23735).Joseph Myers3-1/+50
The redirection of built-in functions such as sqrt in include/math.h applies when the wrappers for those functions in libnldbl_nonshared.a are built, resulting in references to internal names such as __ieee754_sqrt that aren't actually exported from the shared libm. (This applies for sqrt in 2.28, also for the round-to-integer functions in current master because of my changes there.) This patch arranges for NO_MATH_REDIRECT to be used for all the affected functions, and adds a test for those functions in libnldbl_nonshared.a. (We could of course choose to obsolete libnldbl_nonshared.a and require that people building with -mlong-double-64 either include the relevant headers and have a compiler supporting asm redirection, or have some other means of achieving that redirection at compile time if not including those headers. But while we have libnldbl_nonshared.a, it seems appropriate to fix such bugs in it.) Tested for powerpc, and with build-many-glibcs.py. [BZ #23735] * sysdeps/ieee754/ldbl-opt/nldbl-compat.h (NO_MATH_REDIRECT): Define. * sysdeps/ieee754/ldbl-opt/test-nldbl-redirect.c: New file. * sysdeps/ieee754/ldbl-opt/Makefile [$(subdir) = math] (tests): Add test-nldbl-redirect. [$(subdir) = math] (CFLAGS-test-nldbl-redirect.c): New variable. [$(subdir) = math] ($(objpfx)test-nldbl-redirect): Depend on $(objpfx)libnldbl_nonshared.a.
2018-10-02sysdeps/ieee754/soft-fp: ignore maybe-uninitialized with -O [BZ #19444]Martin Jansa1-0/+12
* with -O, -O1, -Os it fails with: In file included from ../soft-fp/soft-fp.h:318, from ../sysdeps/ieee754/soft-fp/s_fdiv.c:28: ../sysdeps/ieee754/soft-fp/s_fdiv.c: In function '__fdiv': ../soft-fp/op-2.h:98:25: error: 'R_f1' may be used uninitialized in this function [-Werror=maybe-uninitialized] X##_f0 = (X##_f1 << (_FP_W_TYPE_SIZE - (N)) | X##_f0 >> (N) \ ^~ ../sysdeps/ieee754/soft-fp/s_fdiv.c:38:14: note: 'R_f1' was declared here FP_DECL_D (R); ^ ../soft-fp/op-2.h:37:36: note: in definition of macro '_FP_FRAC_DECL_2' _FP_W_TYPE X##_f0 _FP_ZERO_INIT, X##_f1 _FP_ZERO_INIT ^ ../soft-fp/double.h:95:24: note: in expansion of macro '_FP_DECL' # define FP_DECL_D(X) _FP_DECL (2, X) ^~~~~~~~ ../sysdeps/ieee754/soft-fp/s_fdiv.c:38:3: note: in expansion of macro 'FP_DECL_D' FP_DECL_D (R); ^~~~~~~~~ ../soft-fp/op-2.h:101:17: error: 'R_f0' may be used uninitialized in this function [-Werror=maybe-uninitialized] : (X##_f0 << (_FP_W_TYPE_SIZE - (N))) != 0)); \ ^~ ../sysdeps/ieee754/soft-fp/s_fdiv.c:38:14: note: 'R_f0' was declared here FP_DECL_D (R); ^ ../soft-fp/op-2.h:37:14: note: in definition of macro '_FP_FRAC_DECL_2' _FP_W_TYPE X##_f0 _FP_ZERO_INIT, X##_f1 _FP_ZERO_INIT ^ ../soft-fp/double.h:95:24: note: in expansion of macro '_FP_DECL' # define FP_DECL_D(X) _FP_DECL (2, X) ^~~~~~~~ ../sysdeps/ieee754/soft-fp/s_fdiv.c:38:3: note: in expansion of macro 'FP_DECL_D' FP_DECL_D (R); ^~~~~~~~~ Build tested with Yocto for ARM, AARCH64, X86, X86_64, PPC, MIPS, MIPS64 with -O, -O1, -Os. For AARCH64 it needs one more fix in locale for -Os. [BZ #19444] * sysdeps/ieee754/soft-fp/s_fdiv.c: Include <libc-diag.h> and use DIAG_PUSH_NEEDS_COMMENT, DIAG_IGNORE_NEEDS_COMMENT and DIAG_POP_NEEDS_COMMENT to disable -Wmaybe-uninitialized.
2018-10-02Fix build from commit 0b727edAdhemerval Zanella1-0/+2
* sysdeps/unix/sysv/linux/fd_to_filename.h: Add missing includes.
2018-10-02x86: Use RTM intrinsics in pthread mutex lock elisionH.J. Lu2-67/+7
Since RTM intrinsics are supported in GCC 4.9, we can use them in pthread mutex lock elision. * sysdeps/unix/sysv/linux/x86/Makefile (CFLAGS-elision-lock.c): Add -mrtm. (CFLAGS-elision-unlock.c): Likewise. (CFLAGS-elision-timed.c): Likewise. (CFLAGS-elision-trylock.c): Likewise. * sysdeps/unix/sysv/linux/x86/hle.h: Rewritten.
2018-10-02libio: Flush stream at freopen (BZ#21037)Adhemerval Zanella2-23/+15
As POSIX states [1] a freopen call should first flush the stream as if by a call fflush. C99 (n1256) and C11 (n1570) only states the function should first close any file associated with the specific stream. Although current implementation only follow C specification, current BSD and other libc implementation (musl) are in sync with POSIX and fflush the stream. This patch change freopen{64} to fflush the stream before actually reopening it (or returning if the stream does not support reopen). It also changes the Linux implementation to avoid a dynamic allocation on 'fd_to_filename'. Checked on x86_64-linux-gnu. [BZ #21037] * libio/Makefile (tests): Add tst-memstream4 and tst-wmemstream4. * libio/freopen.c (freopen): Sync stream before reopen and adjust to new fd_to_filename interface. * libio/freopen64.c (freopen64): Likewise. * libio/tst-memstream.h: New file. * libio/tst-memstream4.c: Likewise. * libio/tst-wmemstream4.c: Likewise. * sysdeps/generic/fd_to_filename.h (fd_to_filename): Change signature. * sysdeps/unix/sysv/linux/fd_to_filename.h (fd_to_filename): Likewise and remove internal dynamic allocation. [1] http://pubs.opengroup.org/onlinepubs/9699919799/
2018-10-01Move MREMAP_* to bits/mman-shared.h.Joseph Myers3-12/+4
The MREMAP_* flags are identical between bits/mman-linux.h and the hppa bits/mman.h; thus, they should be in bits/mman-shared.h instead to avoid unnecessary duplication. This patch moves them there. Tested for x86_64, and with build-many-glibcs.py. * sysdeps/unix/sysv/linux/bits/mman-linux.h [__USE_GNU] (MREMAP_MAYMOVE): Do not define here. [__USE_GNU] (MREMAP_FIXED): Likewise. * sysdeps/unix/sysv/linux/bits/mman-shared.h [__USE_GNU] (MREMAP_MAYMOVE): Define here instead. [__USE_GNU] (MREMAP_FIXED): Likewise. * sysdeps/unix/sysv/linux/hppa/bits/mman.h [__USE_GNU] (MREMAP_MAYMOVE): Remove. [__USE_GNU] (MREMAP_FIXED): Likewise.
2018-09-28Remove unnecessary math_private.h includes.Joseph Myers72-72/+2
After my changes to move various macros, inlines and other content from math_private.h to more specific headers, many files including math_private.h no longer need to do so. Furthermore, since the optimized inlines of various functions have been moved to include/fenv.h or replaced by use of function names GCC inlines automatically, a missing math_private.h include where one is appropriate will reliably cause a build failure rather than possibly causing code to be less well optimized while still building successfully. Thus, this patch removes includes of math_private.h that are now unnecessary. In the case of two RISC-V files, the include is replaced by one of stdbool.h because the files in question were relying on math_private.h to get a definition of bool. Tested for x86_64 and x86, and with build-many-glibcs.py. * math/fromfp.h: Do not include <math_private.h>. * math/s_cacosh_template.c: Likewise. * math/s_casin_template.c: Likewise. * math/s_casinh_template.c: Likewise. * math/s_ccos_template.c: Likewise. * math/s_cproj_template.c: Likewise. * math/s_fdim_template.c: Likewise. * math/s_fmaxmag_template.c: Likewise. * math/s_fminmag_template.c: Likewise. * math/s_iseqsig_template.c: Likewise. * math/s_ldexp_template.c: Likewise. * math/s_nextdown_template.c: Likewise. * math/w_log1p_template.c: Likewise. * math/w_scalbln_template.c: Likewise. * sysdeps/aarch64/fpu/feholdexcpt.c: Likewise. * sysdeps/aarch64/fpu/fesetround.c: Likewise. * sysdeps/aarch64/fpu/fgetexcptflg.c: Likewise. * sysdeps/aarch64/fpu/ftestexcept.c: Likewise. * sysdeps/aarch64/fpu/s_llrint.c: Likewise. * sysdeps/aarch64/fpu/s_llrintf.c: Likewise. * sysdeps/aarch64/fpu/s_lrint.c: Likewise. * sysdeps/aarch64/fpu/s_lrintf.c: Likewise. * sysdeps/i386/fpu/s_atanl.c: Likewise. * sysdeps/i386/fpu/s_f32xaddf64.c: Likewise. * sysdeps/i386/fpu/s_f32xsubf64.c: Likewise. * sysdeps/i386/fpu/s_fdim.c: Likewise. * sysdeps/i386/fpu/s_logbl.c: Likewise. * sysdeps/i386/fpu/s_rintl.c: Likewise. * sysdeps/i386/fpu/s_significandl.c: Likewise. * sysdeps/ia64/fpu/s_matherrf.c: Likewise. * sysdeps/ia64/fpu/s_matherrl.c: Likewise. * sysdeps/ieee754/dbl-64/s_atan.c: Likewise. * sysdeps/ieee754/dbl-64/s_cbrt.c: Likewise. * sysdeps/ieee754/dbl-64/s_fma.c: Likewise. * sysdeps/ieee754/dbl-64/s_fmaf.c: Likewise. * sysdeps/ieee754/flt-32/s_cbrtf.c: Likewise. * sysdeps/ieee754/k_standardf.c: Likewise. * sysdeps/ieee754/k_standardl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_copysignl.c: Likewise. * sysdeps/ieee754/ldbl-64-128/s_finitel.c: Likewise. * sysdeps/ieee754/ldbl-64-128/s_fpclassifyl.c: Likewise. * sysdeps/ieee754/ldbl-64-128/s_isinfl.c: Likewise. * sysdeps/ieee754/ldbl-64-128/s_isnanl.c: Likewise. * sysdeps/ieee754/ldbl-64-128/s_signbitl.c: Likewise. * sysdeps/ieee754/ldbl-96/s_cbrtl.c: Likewise. * sysdeps/ieee754/ldbl-96/s_fma.c: Likewise. * sysdeps/ieee754/ldbl-96/s_fmal.c: Likewise. * sysdeps/ieee754/s_signgam.c: Likewise. * sysdeps/powerpc/power5+/fpu/s_modf.c: Likewise. * sysdeps/powerpc/power5+/fpu/s_modff.c: Likewise. * sysdeps/powerpc/power7/fpu/s_logbf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_ceil.c: Likewise. * sysdeps/riscv/rv64/rvd/s_floor.c: Likewise. * sysdeps/riscv/rv64/rvd/s_nearbyint.c: Likewise. * sysdeps/riscv/rv64/rvd/s_round.c: Likewise. * sysdeps/riscv/rv64/rvd/s_roundeven.c: Likewise. * sysdeps/riscv/rv64/rvd/s_trunc.c: Likewise. * sysdeps/riscv/rvd/s_finite.c: Likewise. * sysdeps/riscv/rvd/s_fmax.c: Likewise. * sysdeps/riscv/rvd/s_fmin.c: Likewise. * sysdeps/riscv/rvd/s_fpclassify.c: Likewise. * sysdeps/riscv/rvd/s_isinf.c: Likewise. * sysdeps/riscv/rvd/s_isnan.c: Likewise. * sysdeps/riscv/rvd/s_issignaling.c: Likewise. * sysdeps/riscv/rvf/fegetround.c: Likewise. * sysdeps/riscv/rvf/feholdexcpt.c: Likewise. * sysdeps/riscv/rvf/fesetenv.c: Likewise. * sysdeps/riscv/rvf/fesetround.c: Likewise. * sysdeps/riscv/rvf/feupdateenv.c: Likewise. * sysdeps/riscv/rvf/fgetexcptflg.c: Likewise. * sysdeps/riscv/rvf/ftestexcept.c: Likewise. * sysdeps/riscv/rvf/s_ceilf.c: Likewise. * sysdeps/riscv/rvf/s_finitef.c: Likewise. * sysdeps/riscv/rvf/s_floorf.c: Likewise. * sysdeps/riscv/rvf/s_fmaxf.c: Likewise. * sysdeps/riscv/rvf/s_fminf.c: Likewise. * sysdeps/riscv/rvf/s_fpclassifyf.c: Likewise. * sysdeps/riscv/rvf/s_isinff.c: Likewise. * sysdeps/riscv/rvf/s_isnanf.c: Likewise. * sysdeps/riscv/rvf/s_issignalingf.c: Likewise. * sysdeps/riscv/rvf/s_nearbyintf.c: Likewise. * sysdeps/riscv/rvf/s_roundevenf.c: Likewise. * sysdeps/riscv/rvf/s_roundf.c: Likewise. * sysdeps/riscv/rvf/s_truncf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_rint.c: Include <stdbool.h> instead of <math_private.h>. * sysdeps/riscv/rvf/s_rintf.c: Likewise.
2018-09-28i386: Use _dl_runtime_[resolve|profile]_shstk for SHSTK [BZ #23716]H.J. Lu2-69/+11
When elf_machine_runtime_setup is called to set up resolver, it should use _dl_runtime_resolve_shstk or _dl_runtime_profile_shstk if SHSTK is enabled by kernel. Tested on i686 with and without --enable-cet as well as on CET emulator with --enable-cet. [BZ #23716] * sysdeps/i386/dl-cet.c: Removed. * sysdeps/i386/dl-machine.h (_dl_runtime_resolve_shstk): New prototype. (_dl_runtime_profile_shstk): Likewise. (elf_machine_runtime_setup): Use _dl_runtime_profile_shstk or _dl_runtime_resolve_shstk if SHSTK is enabled by kernel. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2018-09-28Fix misreported errno on preadv2/pwritev2 (BZ#23579)Adhemerval Zanella4-4/+4
The fallback code of Linux wrapper for preadv2/pwritev2 executes regardless of the errno code for preadv2, instead of the case where the syscall is not supported. This fixes it by calling the fallback code iff errno is ENOSYS. The patch also adds tests for both invalid file descriptor and invalid iov_len and vector count. The only discrepancy between preadv2 and fallback code regarding error reporting is when an invalid flags are used. The fallback code bails out earlier with ENOTSUP instead of EINVAL/EBADF when the syscall is used. Checked on x86_64-linux-gnu on a 4.4.0 and 4.15.0 kernel. [BZ #23579] * misc/tst-preadvwritev2-common.c (do_test_with_invalid_fd): New test. * misc/tst-preadvwritev2.c, misc/tst-preadvwritev64v2.c (do_test): Call do_test_with_invalid_fd. * sysdeps/unix/sysv/linux/preadv2.c (preadv2): Use fallback code iff errno is ENOSYS. * sysdeps/unix/sysv/linux/preadv64v2.c (preadv64v2): Likewise. * sysdeps/unix/sysv/linux/pwritev2.c (pwritev2): Likewise. * sysdeps/unix/sysv/linux/pwritev64v2.c (pwritev64v2): Likewise.
2018-09-27Use copysign functions not __copysign functions in glibc libm.Joseph Myers53-92/+99
Continuing the move to use, within libm, public names for libm functions that can be inlined as built-in functions on many architectures, this patch moves calls to __copysign functions to call the corresponding copysign names instead, with asm redirection to __copysign when the calls are not inlined (all cases are inlined except for IBM long double for powerpc soft-float / e500v1). This eliminates the need for an inline function defining __copysign in terms of __builtin_copysign. Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT_BINARY_ARGS): New macro. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (copysign): Redirect using MATH_REDIRECT. * sysdeps/alpha/fpu/s_copysign.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/alpha/fpu/s_copysignf.c: Likewise. * sysdeps/ieee754/dbl-64/s_copysign.c: Likewise. * sysdeps/ieee754/float128/s_copysignf128.c: Likewise. * sysdeps/ieee754/flt-32/s_copysignf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_copysignl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_copysignl.c: Likewise. * sysdeps/ieee754/ldbl-96/s_copysignl.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysign.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysignf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysignf.c: Likewise. * sysdeps/riscv/rvd/s_copysign.c: Likewise. * sysdeps/riscv/rvf/s_copysignf.c: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_copysign.c: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_copysignf.c: Likewise. * sysdeps/generic/math_private_calls.h [!__MATH_DECLARING_LONG_DOUBLE || !NO_LONG_DOUBLE] (__copysign): Do not declare and define as an inline function. * math/divtc3.c (__divtc3): Use copysign functions instead of __copysign variants. * math/multc3.c (__multc3): Likewise. * sysdeps/generic/math-type-macros.h (M_COPYSIGN): Likewise. * sysdeps/ieee754/dbl-64/e_atan2.c (signArctan2): Likewise. * sysdeps/ieee754/dbl-64/e_atanh.c (__ieee754_atanh): Likewise. * sysdeps/ieee754/dbl-64/e_gamma_r.c (__ieee754_gamma_r): Likewise. * sysdeps/ieee754/dbl-64/e_jn.c (__ieee754_jn): Likewise. (__ieee754_yn): Likewise. * sysdeps/ieee754/dbl-64/s_asinh.c (__asinh): Likewise. * sysdeps/ieee754/dbl-64/s_atan.c (__signArctan): Likewise. * sysdeps/ieee754/dbl-64/s_scalbln.c (__scalbln): Likewise. * sysdeps/ieee754/dbl-64/s_scalbn.c (__scalbn): Likewise. * sysdeps/ieee754/dbl-64/s_sin.c (do_sin): Likewise. (__sin): Likewise. * sysdeps/ieee754/dbl-64/s_sincos.c (__sincos): Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_nearbyint.c (__nearbyint): Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_scalbln.c (__scalbln): Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_scalbn.c (__scalbn): Likewise. * sysdeps/ieee754/flt-32/e_atanhf.c (__ieee754_atanhf): Likewise. * sysdeps/ieee754/flt-32/e_gammaf_r.c (__ieee754_gammaf_r): Likewise. * sysdeps/ieee754/flt-32/e_jnf.c (__ieee754_jnf): Likewise. (__ieee754_ynf): Likewise. * sysdeps/ieee754/flt-32/s_asinhf.c (__asinhf): Likewise. * sysdeps/ieee754/flt-32/s_scalbnf.c (__scalbnf): Likewise. * sysdeps/ieee754/k_standard.c (__kernel_standard): Likewise. * sysdeps/ieee754/ldbl-128/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-128/e_jnl.c (__ieee754_jnl): Likewise. (__ieee754_ynl): Likewise. * sysdeps/ieee754/ldbl-128/s_scalblnl.c (__scalblnl): Likewise. * sysdeps/ieee754/ldbl-128/s_scalbnl.c (__scalbnl): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_jnl.c (__ieee754_jnl): Likewise. (__ieee754_ynl): Likewise. * sysdeps/ieee754/ldbl-128ibm/s_fmal.c (__fmal): Likewise. * sysdeps/ieee754/ldbl-128ibm/s_scalblnl.c (__scalblnl): Likewise. * sysdeps/ieee754/ldbl-128ibm/s_scalbnl.c (__scalbnl): Likewise. * sysdeps/ieee754/ldbl-96/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-96/e_jnl.c (__ieee754_jnl): Likewise. (__ieee754_ynl) * sysdeps/ieee754/ldbl-96/s_asinhl.c (__asinhl): Likewise. * sysdeps/ieee754/ldbl-96/s_scalblnl.c (__scalblnl): Likewise. * sysdeps/ieee754/ldbl-opt/nldbl-copysign.c (copysignl): Likewise. * sysdeps/powerpc/power5+/fpu/s_modf.c (__modf): Likewise. * sysdeps/powerpc/power5+/fpu/s_modff.c (__modff): Likewise.
2018-09-27Use round functions not __round functions in glibc libm.Joseph Myers23-35/+33
Continuing the move to use, within libm, public names for libm functions that can be inlined as built-in functions on many architectures, this patch moves calls to __round functions to call the corresponding round names instead, with asm redirection to __round when the calls are not inlined. An additional complication arises in sysdeps/ieee754/ldbl-128ibm/e_expl.c, where a call to roundl, with the result converted to int, gets converted by the compiler to call lroundl in the case of 32-bit long, so resulting in localplt test failures. It's logically correct to let the compiler make such an optimization; an appropriate asm redirection of lroundl to __lroundl is thus added to that file (it's not needed anywhere else). Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (round): Redirect using MATH_REDIRECT. * sysdeps/aarch64/fpu/s_round.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/aarch64/fpu/s_roundf.c: Likewise. * sysdeps/ieee754/dbl-64/s_round.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_round.c: Likewise. * sysdeps/ieee754/float128/s_roundf128.c: Likewise. * sysdeps/ieee754/flt-32/s_roundf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_roundl.c: Likewise. * sysdeps/ieee754/ldbl-96/s_roundl.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_round.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_roundf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_round.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_roundf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_round.c: Likewise. * sysdeps/riscv/rvf/s_roundf.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_roundl.c: Likewise. (round): Redirect to __round. (__roundl): Call round instead of __round. * sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__round): Remove macro. [_ARCH_PWR5X] (__roundf): Likewise. * sysdeps/ieee754/dbl-64/e_gamma_r.c (gamma_positive): Use round functions instead of __round variants. * sysdeps/ieee754/flt-32/e_gammaf_r.c (gammaf_positive): Likewise. * sysdeps/ieee754/ldbl-128/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/ieee754/ldbl-96/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/x86/fpu/powl_helper.c (__powl_helper): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_expl.c (lroundl): Redirect to __lroundl. (__ieee754_expl): Call roundl instead of __roundl.
2018-09-26Add missing unwind information to ld.so on powerpc32 (bug 23707)Andreas Schwab1-0/+3
2018-09-26Share MAP_* flags between more architectures.Joseph Myers5-46/+49
Continuing bits/mman.h unification between architectures using the Linux kernel, this patch arranges for the common set of MAP_* flags to be used by two more architectures. That common set is moved to bits/mman-map-flags-generic.h, which is included by bits/mman.h, to allow architectures to use that common set even if they also have architecture-specific additions to it. As well as the generic bits/mman.h, the versions for x86 and ia64 are also then made to include bits/mman-map-flags-generic.h, so while they still need architecture-specific bits/mman.h (for MAP_32BIT and MAP_GROWSUP respectively), they do not need to duplicate the generic flag definitions in there. Tested for x86_64 and x86, and with build-many-glibcs.py. * sysdeps/unix/sysv/linux/bits/mman-map-flags-generic.h: New file. Most contents moved from .... * sysdeps/unix/sysv/linux/bits/mman.h: ... here. Move contents to and include <bits/mman-map-flags-generic.h>. * sysdeps/unix/sysv/linux/Makefile [$(subdir) = misc] (sysdep_headers): Add bits/mman-map-flags-generic.h. * sysdeps/unix/sysv/linux/ia64/bits/mman.h: Include <bits/mman-map-flags-generic.h>. [__USE_MISC] (MAP_GROWSUP): Only define this macro, not other macros defined in <bits/mman-map-flags-generic.h>. * sysdeps/unix/sysv/linux/x86/bits/mman.h: Include <bits/mman-map-flags-generic.h>. [__USE_MISC] (MAP_32BIT): Only define this macro, not other macros defined in <bits/mman-map-flags-generic.h>.
2018-09-25Complete sys/procfs.h unification.Joseph Myers6-130/+94
This patch completes the process of unifying sys/procfs.h headers for architectures using the Linux kernel by making alpha use the generic version. That was previously deferred because alpha has different definitions of prgregset_t and prfpregset_t from other architectures, so changing to the common definitions would change C++ name mangling. To avoid such a change, a header bits/procfs-prregset.h is added, and alpha gets its own version of that header. Tested for x86_64 and x86, and with build-many-glibcs.py. * sysdeps/unix/sysv/linux/sys/procfs.h: Include <bits/procfs-prregset.h>. (prgregset_t): Define using __prgregset_t. (prfpregset_t): Define using __prfpregset_t. * sysdeps/unix/sysv/linux/Makefile [$(subdir) = misc] (sysdep_headers): Add bits/procfs-prregset.h. * sysdeps/unix/sysv/linux/bits/procfs-prregset.h: New file. * sysdeps/unix/sysv/linux/alpha/bits/procfs-prregset.h: Likewise. * sysdeps/unix/sysv/linux/alpha/bits/procfs.h: Likewise. * sysdeps/unix/sysv/linux/alpha/sys/procfs.h: Remove file.
2018-09-25Unify more sys/procfs.h headers.Joseph Myers24-892/+619
This patch continues the process of unifying sys/procfs.h headers for architectures using the Linux kernel. A bits/procfs-id.h header is added to define __pr_uid_t and __pr_gid_t for the types of pr_uid and pr_gid; the default version of this header uses unsigned int. On some architectures, sys/procfs.h has copies of 32-bit structures for 64-bit builds; those move into a bits/procfs-extra.h header (they can't go in bits/procfs.h because they have to come *after* other declarations from sys/procfs.h). Given appropriate versions of these headers, six more architectures can then move to providing only bits/procfs*.h without duplicating the rest of the contents of sys/procfs.h. Only alpha needs a further bits/ header to be added before it can stop having its own sys/procfs.h. Tested for x86_64 and x86, and with build-many-glibcs.py. * sysdeps/unix/sysv/linux/sys/procfs.h: Include <bits/procfs-id.h> and <bits/procfs-extra.h>. (struct elf_prpsinfo): Use __pr_uid_t and __pr_gid_t as types of pr_uid and pr_gid. * sysdeps/unix/sysv/linux/Makefile [$(subdir) = misc] (sysdep_headers): Add bits/procfs-id.h and bits/procfs-extra.h. * sysdeps/unix/sysv/linux/bits/procfs-extra.h: New file. * sysdeps/unix/sysv/linux/bits/procfs-id.h: Likewise. * sysdeps/unix/sysv/linux/arm/bits/procfs-id.h: Likewise. * sysdeps/unix/sysv/linux/arm/bits/procfs.h: Likewise. * sysdeps/unix/sysv/linux/m68k/bits/procfs-id.h: Likewise. * sysdeps/unix/sysv/linux/m68k/bits/procfs.h: Likewise. * sysdeps/unix/sysv/linux/s390/bits/procfs-extra.h: Likewise. * sysdeps/unix/sysv/linux/s390/bits/procfs-id.h: Likewise. * sysdeps/unix/sysv/linux/s390/bits/procfs.h: Likewise. * sysdeps/unix/sysv/linux/sh/bits/procfs-id.h: Likewise. * sysdeps/unix/sysv/linux/sh/bits/procfs.h: Likewise. * sysdeps/unix/sysv/linux/sparc/bits/procfs-extra.h: Likewise. * sysdeps/unix/sysv/linux/sparc/bits/procfs-id.h: Likewise. * sysdeps/unix/sysv/linux/sparc/bits/procfs.h: Likewise. * sysdeps/unix/sysv/linux/x86/bits/procfs-id.h: Likewise. * sysdeps/unix/sysv/linux/x86/bits/procfs.h: Likewise. * sysdeps/unix/sysv/linux/arm/sys/procfs.h: Remove file. * sysdeps/unix/sysv/linux/m68k/sys/procfs.h: Likewise. * sysdeps/unix/sysv/linux/s390/sys/procfs.h: Likewise. * sysdeps/unix/sysv/linux/sh/sys/procfs.h: Likewise. * sysdeps/unix/sysv/linux/sparc/sys/procfs.h: Likewise. * sysdeps/unix/sysv/linux/x86/sys/procfs.h: Likewise.
2018-09-25Unify some sys/procfs.h headers.Joseph Myers19-1015/+323
As per recent discussions, this patch unifies some of the sys/procfs.h headers for architectures using the Linux kernel, producing a generic version that can hopefully be used by all new architectures as well. The new generic version is based on the AArch64 one. The register definitions, the only part that generally needs to vary by architecture, go in a new bits/procfs.h header (which each architecture using the generic version needs to provide); that header also has any #includes that were in the architecture-specific sys/procfs.h, where those includes went beyond the generic set. The generic version is used for eight architectures where the generic definitions were the same as the architecture-specific ones. (Some of those architectures had #if 0 fields, now removed; some defined types or fields using different type names which were typedefs for the same underlying types.) Six of the remaining architectures with their own sys/procfs.h use unsigned short for pr_uid / pr_gid in some cases; moving those to the generic header will require a bits/ header to define a typedef for the type of those fields. In the case of alpha, the generic sys/procfs.h uses elf_gregset_t (= unsigned long int[33]) to define prgregset_t and elf_fpregset_t (= double[32]) to define prfpregset_t, but the alpha version uses gregset_t (= long int[33]) and fpregset_t (= long int[32]), so avoiding unnecessarily changing the underlying types (and thus C++ name mangling) again means a bits/ header will need to be able to define a different choice for those typedefs. bits/procfs.h is included outside the __BEGIN_DECLS / __END_DECLS pair (whereas the definitions it contains were previously inside that pair in various sys/procfs.h headers), because it sometimes includes other headers and putting those other #includes inside that pair seems risky. Because none of the declarations in bits/procfs.h are of functions or variables or involve function types, I don't think it makes any difference whether they are inside or outside an extern "C" context. Tested with build-many-glibcs.py (again, that does not provide much validation for the correctness of this patch). * sysdeps/unix/sysv/linux/sys/procfs.h: Replace with file based on AArch64 version. Include <bits/procfs.h>. * sysdeps/unix/sysv/linux/Makefile [$(subdir) = misc] (sysdep_headers): Add bits/procfs.h. * sysdeps/unix/sysv/linux/bits/procfs.h: New file. * sysdeps/unix/sysv/linux/aarch64/bits/procfs.h: Likewise. * sysdeps/unix/sysv/linux/hppa/bits/procfs.h: Likewise. * sysdeps/unix/sysv/linux/ia64/bits/procfs.h: Likewise. * sysdeps/unix/sysv/linux/microblaze/bits/procfs.h: Likewise. * sysdeps/unix/sysv/linux/mips/bits/procfs.h: Likewise. * sysdeps/unix/sysv/linux/nios2/bits/procfs.h: Likewise. * sysdeps/unix/sysv/linux/powerpc/bits/procfs.h: Likewise. * sysdeps/unix/sysv/linux/riscv/bits/procfs.h: Likewise. * sysdeps/unix/sysv/linux/aarch64/sys/procfs.h: Remove file. * sysdeps/unix/sysv/linux/hppa/sys/procfs.h: Likewise. * sysdeps/unix/sysv/linux/ia64/sys/procfs.h: Likewise. * sysdeps/unix/sysv/linux/microblaze/sys/procfs.h: Likewise. * sysdeps/unix/sysv/linux/mips/sys/procfs.h: Likewise. * sysdeps/unix/sysv/linux/nios2/sys/procfs.h: Likewise. * sysdeps/unix/sysv/linux/powerpc/sys/procfs.h: Likewise. * sysdeps/unix/sysv/linux/riscv/sys/procfs.h: Likewise.
2018-09-21powerpc: Only enable TLE with PPC_FEATURE2_HTM_NOSCAdhemerval Zanella10-100/+21
Linux from 3.9 through 4.2 does not abort HTM transaction on syscalls, instead it suspend and resume it when leaving the kernel. The side-effects of the syscall will always remain visible, even if the transaction is aborted. This is an issue when transaction is used along with futex syscall, on pthread_cond_wait for instance, where the futex call might succeed but the transaction is rolled back leading the pthread_cond object in an inconsistent state. Glibc used to prevent it by always aborting a transaction before issuing a syscall. Linux 4.2 also decided to abort active transaction in syscalls which makes the glibc workaround superfluous. Worse, glibc transaction abortion leads to a performance issue on recent kernels where the HTM state is saved/restore lazily (v4.9). By aborting a transaction on every syscalls, regardless whether a transaction has being initiated before, GLIBS makes the kernel always save/restore HTM state (it can not even lazily disable it after a certain number of syscall iterations). Because of this shortcoming, Transactional Lock Elision is just enabled when it has been explicitly set (either by tunables of by a configure switch) and if kernel aborts HTM transactions on syscalls (PPC_FEATURE2_HTM_NOSC). It is reported that using simple benchmark [1], the context-switch is about 5% faster by not issuing a tabort in every syscall in newer kernels. Checked on powerpc64le-linux-gnu with 4.4.0 kernel (Ubuntu 16.04). * NEWS: Add note about new TLE support on powerpc64le. * sysdeps/powerpc/nptl/tcb-offsets.sym (TM_CAPABLE): Remove. * sysdeps/powerpc/nptl/tls.h (tcbhead_t): Rename tm_capable to __ununsed1. (TLS_INIT_TP, TLS_DEFINE_INIT_TP): Remove tm_capable setup. (THREAD_GET_TM_CAPABLE, THREAD_SET_TM_CAPABLE): Remove macros. * sysdeps/powerpc/powerpc32/sysdep.h, sysdeps/powerpc/powerpc64/sysdep.h (ABORT_TRANSACTION_IMPL, ABORT_TRANSACTION): Remove macros. * sysdeps/powerpc/sysdep.h (ABORT_TRANSACTION): Likewise. * sysdeps/unix/sysv/linux/powerpc/elision-conf.c (elision_init): Set __pthread_force_elision iff PPC_FEATURE2_HTM_NOSC is set. * sysdeps/unix/sysv/linux/powerpc/powerpc32/sysdep.h, sysdeps/unix/sysv/linux/powerpc/powerpc64/sysdep.h sysdeps/unix/sysv/linux/powerpc/syscall.S (ABORT_TRANSACTION): Remove usage. * sysdeps/unix/sysv/linux/powerpc/not-errno.h: Remove file. Reported-by: Breno Leitão <leitao@debian.org>
2018-09-20Use trunc functions not __trunc functions in glibc libm.Joseph Myers25-31/+34
Continuing the move to use, within libm, public names for libm functions that can be inlined as built-in functions on many architectures, this patch moves calls to __trunc functions to call the corresponding trunc names instead, with asm redirection to __trunc when the calls are not inlined. Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (trunc): Redirect using MATH_REDIRECT. * sysdeps/aarch64/fpu/s_trunc.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/aarch64/fpu/s_truncf.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_trunc.c: Likewise. * sysdeps/ieee754/float128/s_truncf128.c: Likewise. * sysdeps/ieee754/dbl-64/s_trunc.c: Likewise. * sysdeps/ieee754/flt-32/s_truncf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_truncl.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_truncf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_truncf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_trunc.c: Likewise. * sysdeps/riscv/rvf/s_truncf.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_truncf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_truncf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_trunc_template.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_truncl.c: Likewise. (ceil): Redirect to __ceil. (floor): Redirect to __floor. (trunc): Redirect to __trunc. (__truncl): Call trunc instead of __trunc. * sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__trunc): Remove macro. [_ARCH_PWR5X] (__truncf): Likewise. * sysdeps/ieee754/dbl-64/e_gamma_r.c (__ieee754_gamma_r): Use trunc functions instead of __trunc variants. * sysdeps/ieee754/flt-32/e_gammaf_r.c (__ieee754_gammaf_r): Likewise. * sysdeps/ieee754/ldbl-128/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-96/e_gammal_r.c (__ieee754_gammal_r): Likewise.
2018-09-20Invert sense of list of i686-class processors in sysdeps/x86/cpu-features.h.Joseph Myers1-18/+7
I noticed that sysdeps/x86/cpu-features.h had conditionals on whether to define HAS_CPUID, HAS_I586 and HAS_I686 with a long list of preprocessor macros for i686-and-later processors which however was out of date. This patch avoids the problem of the list getting out of date by instead having conditionals on all the (few, old) pre-i686 processors for which GCC has preprocessor macros, rather than the (many, expanding list) i686-and-later processors. It seems HAS_I586 and HAS_I686 are unused so the only effect of these macros being missing is that 32-bit glibc built for one of these processors would end up doing runtime detection of CPUID availability. i386 builds are prevented by a configure test so there is no need to allow for them here. __geode__ (no long nops?) and __k6__ (no CMOV, at least according to GCC) are conservatively handled as i586, not i686, here (as noted above, this is a theoretical distinction at present in that only HAS_CPUID appears to be used). Tested for x86. * sysdeps/x86/cpu-features.h [__geode__ || __k6__]: Handle like [__i586__ || __pentium__]. [__i486__]: Handle explicitly. (HAS_CPUID): Define to 1 if above macros are undefined. (HAS_I586): Likewise. (HAS_I686): Likewise.
2018-09-20Linux gethostid: Check for NULL value from gethostbyname_r [BZ #23679]Mingli Yu1-2/+2
A NULL value can happen with certain gethostbyname_r failures.
2018-09-19Fix the documentation comment of checkint in powfSzabolcs Nagy1-1/+2
checkint in powf is not supposed to be used with 0, inf or nan inputs. * sysdeps/ieee754/flt-32/e_powf.c (checkint): Fix documentation.
2018-09-19Add new pow implementationSzabolcs Nagy12-10614/+569
The algorithm is exp(y * log(x)), where log(x) is computed with about 1.3*2^-68 relative error (1.5*2^-68 without fma), returning the result in two doubles, and the exp part uses the same algorithm (and lookup tables) as exp, but takes the input as two doubles and a sign (to handle negative bases with odd integer exponent). The __exp1 internal symbol is no longer necessary. There is separate code path when fma is not available but the worst case error is about 0.54 ULP in both cases. The lookup table and consts for log are 4168 bytes. The .rodata+.text is decreased by 37908 bytes on aarch64. The non-nearest rounding error is less than 1 ULP. Improvements on Cortex-A72 compared to current glibc master: pow thruput: 2.40x in [0.01 11.1]x[0.01 11.1] pow latency: 1.84x in [0.01 11.1]x[0.01 11.1] Tested on aarch64-linux-gnu (defined __FP_FAST_FMA, TOINT_INTRINSICS) and arm-linux-gnueabihf (!defined __FP_FAST_FMA, !TOINT_INTRINSICS) and x86_64-linux-gnu (!defined __FP_FAST_FMA, !TOINT_INTRINSICS) and powerpc64le-linux-gnu (defined __FP_FAST_FMA, !TOINT_INTRINSICS) targets. * NEWS: Mention pow improvements. * math/Makefile (type-double-routines): Add e_pow_log_data. * sysdeps/generic/math_private.h (__exp1): Remove. * sysdeps/i386/fpu/e_pow_log_data.c: New file. * sysdeps/ia64/fpu/e_pow_log_data.c: New file. * sysdeps/ieee754/dbl-64/Makefile (CFLAGS-e_pow.c): Allow fma contraction. * sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove. (exp_inline): Remove. (__ieee754_exp): Only single double input is handled. * sysdeps/ieee754/dbl-64/e_pow.c: Rewrite. * sysdeps/ieee754/dbl-64/e_pow_log_data.c: New file. * sysdeps/ieee754/dbl-64/math_config.h (issignaling_inline): Define. (__pow_log_data): Define. * sysdeps/ieee754/dbl-64/upow.h: Remove. * sysdeps/ieee754/dbl-64/upow.tbl: Remove. * sysdeps/m68k/m680x0/fpu/e_pow_log_data.c: New file. * sysdeps/x86_64/fpu/multiarch/Makefile (CFLAGS-e_pow-fma.c): Allow fma contraction. (CFLAGS-e_pow-fma4.c): Likewise.
2018-09-18Unify many bits/mman.h headers.Joseph Myers8-318/+6
Many bits/mman.h headers for Linux architectures have exactly the same contents, up to whitespace, comments and the number of leading 0s on constants. Specifically, this applies to architectures that, in the Linux kernel, either have no uapi/asm/mman.h, or have one that includes asm-generic/mman.h without any changes or additions relevant to glibc (this last case is the one that applies to Arm). It's not useful to have to duplicate the set of MAP_* constants in glibc for all such architectures and any new architectures with that property. Thus, this patch creates a generic sysdeps/unix/sysv/linux/bits/mman.h and removes all the architecture-specific versions that become unnecessary. Further unification remains possible after this patch. For example, the new bits/mman.h could become bits/mman-map-flags-generic.h so that it could also be used by architecture-specific bits/mman.h headers on architectures that use the generic flags but add architecture-specific ones to them. That would allow this common set of MAP_* definitions to be used on ia64 and x86 as well (architectures that include asm-generic/mman.h from their own uapi/asm/mman.h but define additional MAP_* values of their own). Tested with build-many-glibcs.py. * sysdeps/unix/sysv/linux/bits/mman.h: New file. * sysdeps/unix/sysv/linux/aarch64/bits/mman.h: Remove. * sysdeps/unix/sysv/linux/arm/bits/mman.h: Likewise. * sysdeps/unix/sysv/linux/m68k/bits/mman.h: Likewise. * sysdeps/unix/sysv/linux/microblaze/bits/mman.h: Likewise. * sysdeps/unix/sysv/linux/nios2/bits/mman.h: Likewise. * sysdeps/unix/sysv/linux/riscv/bits/mman.h: Likewise. * sysdeps/unix/sysv/linux/s390/bits/mman.h: Likewise. * sysdeps/unix/sysv/linux/sh/bits/mman.h: Likewise.
2018-09-18Fix ldbl-128ibm ceill, floorl inlining of ceil, floor.Joseph Myers2-4/+8
The ldbl-128ibm implementations of ceill and floorl call the corresponding double functions. This patch fixes those implementations to call those functions as ceil and floor rather than as __ceil and __floor, so that the proper inlining takes place when possible, while including local asm redirections for when the functions are not inlined since NO_MATH_REDIRECT applies to the double functions as well as to the long double ones. Tested with build-many-glibcs.py for all its powerpc configurations. * sysdeps/ieee754/ldbl-128ibm/s_ceill.c (ceil): Redirect to __ceil. (__ceill): Call ceil instead of __ceil. * sysdeps/ieee754/ldbl-128ibm/s_floorl.c (floor): Redirect to __floor. (__floorl): Call floor instead of __floor.
2018-09-17Use ceil functions not __ceil functions in glibc libm.Joseph Myers28-32/+31
Continuing the move to use, within libm, public names for libm functions that can be inlined as built-in functions on many architectures, this patch moves calls to __ceil functions to call the corresponding ceil names instead, with asm redirection to __ceil when the calls are not inlined. Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (ceil): Redirect using MATH_REDIRECT. * sysdeps/aarch64/fpu/s_ceil.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/aarch64/fpu/s_ceilf.c: Likewise. * sysdeps/ieee754/dbl-64/s_ceil.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_ceil.c: Likewise. * sysdeps/ieee754/float128/s_ceilf128.c: Likewise. * sysdeps/ieee754/flt-32/s_ceilf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_ceill.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_ceill.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_ceil_template.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_ceil.c: Likewise. * sysdeps/riscv/rvf/s_ceilf.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_ceil.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_ceilf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_ceil.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_ceilf.c: Likewise. * sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__ceil): Remove macro. * sysdeps/ieee754/dbl-64/e_gamma_r.c (gamma_positive): Use ceil functions instead of __ceil variants. * sysdeps/ieee754/flt-32/e_gammaf_r.c (gammaf_positive): Likewise. * sysdeps/ieee754/ldbl-128/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/ieee754/ldbl-128ibm/s_truncl.c (__truncl): Likewise. * sysdeps/ieee754/ldbl-96/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/powerpc/power5+/fpu/s_modf.c (__modf): Likewise. * sysdeps/powerpc/power5+/fpu/s_modff.c (__modff): Likewise.
2018-09-17Update siginfo constants from Linux kernel (bug 21286).Joseph Myers2-17/+27
As of Linux 4.17, siginfo headers in the Linux kernel have been largely unified across architectures (so various constants are defined with common values in include/uapi/asm-generic/siginfo.h even if not all architectures can generate those particular constants). This patch makes glibc reflect that unification and the current set of constants in that header as of Linux 4.18. Various constants are added to bits/siginfo-consts.h (under the same feature test macro conditions as the other constants with the same prefix), and removed from the ia64 bits/siginfo-consts-arch.h where they were previously there - this is not limited to constants added by the unification. Nothing is done about macros that are defined in include/uapi/asm-generic/siginfo.h with names with leading '__' (some of those are ia64-specific ones that remain in the ia64 bits/siginfo-consts-arch.h without the leading '__' there). A consequence of these changes is that TRAP_HWBKPT becomes available on AArch64 and all other architectures as requested in bug 21286. Tested for x86_64; tested with build-many-glibcs.py for ia64. [BZ #21286] * sysdeps/unix/sysv/linux/bits/siginfo-consts.h (SI_DETHREAD): New constant. [__USE_XOPEN_EXTENDED || __USE_XOPEN2K8] (ILL_BADIADDR): Likewise. [__USE_XOPEN_EXTENDED || __USE_XOPEN2K8] (FPE_FLTUNK): Likewise. [__USE_XOPEN_EXTENDED || __USE_XOPEN2K8] (FPE_CONDTRAP): Likewise. [__USE_XOPEN_EXTENDED || __USE_XOPEN2K8] (SEGV_ACCADI): Likewise. [__USE_XOPEN_EXTENDED || __USE_XOPEN2K8] (SEGV_ADIDERR): Likewise. [__USE_XOPEN_EXTENDED || __USE_XOPEN2K8] (SEGV_ADIPERR): Likewise. [__USE_XOPEN_EXTENDED] (TRAP_BRANCH): Likewise. [__USE_XOPEN_EXTENDED] (TRAP_HWBKPT): Likewise. [__USE_XOPEN_EXTENDED] (TRAP_UNK): Likweise. * sysdeps/unix/sysv/linux/ia64/bits/siginfo-consts-arch.h (ILL_BADIADDR): Remove constant. (TRAP_BRANCH): Likewise. (TRAP_HWBKPT): Likewise.
2018-09-14Fix MIPS n32 pr_sigpend, pr_sighold, pr_flag type (bug 23656).Joseph Myers1-9/+0
As discussed at <https://sourceware.org/ml/libc-alpha/2018-09/msg00191.html> and followup discussions, the MIPS n32 definitions of pr_sigpend and pr_sighold in struct elf_prstatus, and pr_flag in struct elf_prpsinfo, are wrong to use unsigned long long int; actual n32 core dumps use a 32-bit type there, so userspace unsigned long int is correct for all MIPS ABIs. This patch removes the conditionals (also thereby aligning the structures with other architectures and so facilitating future unification of different versions of this header). Tested with build-many-glibcs.py for its MIPS configurations. [BZ #23656] * sysdeps/unix/sysv/linux/mips/sys/procfs.h (struct elf_prstatus): Remove [_MIPS_SIM = _ABIN32] conditional case. (struct elf_prpsinfo): Likewise.
2018-09-14Fix sys/procfs.h pr_uid, pr_gid type (bug 23649).Joseph Myers5-10/+10
As noted in <https://sourceware.org/ml/libc-alpha/2018-09/msg00178.html>, glibc's sys/procfs.h headers for microblaze, mips (n64), nios2 and riscv have incorrect types for the pr_uid and pr_gid members of struct elf_prpsinfo (as does the generic Linux version, but nothing uses that). This patch fixes those headers to use unsigned int. The generic Linux version is also fixed, but I do *not* recommend making new architectures use it yet. Rather, I think it should be reworked to look more like a copy of the AArch64 version, but with a new <bits/procfs.h> header included to provide register set definitions; <bits/procfs.h> would then be architecture-specific while many architectures could use the generic <sys/procfs.h>. This fix is deliberately separate from any reworking to use a generic header more, since it's possible there could be uses for backporting this fix but not for backporting a subsequent cleanup. Tested with build-many-glibcs.py. This of course doesn't provide much validation of the structure layout; if the Linux kernel is fixed so that "#include <linux/elfcore.h>" actually compiles with the headers from "make headers_install" (and if the layout in both headers is meant to be the same, whatever ABI we are building for), I have a test that can be added to glibc to check the layout against that from the Linux kernel. [BZ #23649] * sysdeps/unix/sysv/linux/microblaze/sys/procfs.h (struct elf_prpsinfo): Use unsigned int for pr_uid and pr_gid. * sysdeps/unix/sysv/linux/mips/sys/procfs.h (struct elf_prpsinfo): Likewise. * sysdeps/unix/sysv/linux/nios2/sys/procfs.h (struct elf_prpsinfo): Likewise. * sysdeps/unix/sysv/linux/riscv/sys/procfs.h (struct elf_prpsinfo): Likewise. * sysdeps/unix/sysv/linux/sys/procfs.h (struct elf_prpsinfo): Likewise.
2018-09-14Use rint functions not __rint functions in glibc libm.Joseph Myers36-45/+39
Continuing the move to use, within libm, public names for libm functions that can be inlined as built-in functions on many architectures, this patch moves calls to __rint functions to call the corresponding rint names instead, with asm redirection to __rint when the calls are not inlined. The x86_64 math_private.h is removed as no longer useful after this patch. This patch is relative to a tree with my floor patch <https://sourceware.org/ml/libc-alpha/2018-09/msg00148.html> applied, and much the same considerations arise regarding possibly replacing an IFUNC call with a direct inline expansion. Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (rint): Redirect using MATH_REDIRECT. * sysdeps/aarch64/fpu/s_rint.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/aarch64/fpu/s_rintf.c: Likewise. * sysdeps/alpha/fpu/s_rint.c: Likewise. * sysdeps/alpha/fpu/s_rintf.c: Likewise. * sysdeps/i386/fpu/s_rintl.c: Likewise. * sysdeps/ieee754/dbl-64/s_rint.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_rint.c: Likewise. * sysdeps/ieee754/float128/s_rintf128.c: Likewise. * sysdeps/ieee754/flt-32/s_rintf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_rintl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_rintl.c: Likewise. * sysdeps/m68k/coldfire/fpu/s_rint.c: Likewise. * sysdeps/m68k/coldfire/fpu/s_rintf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_rint.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_rintf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_rintl.c: Likewise. * sysdeps/powerpc/fpu/s_rint.c: Likewise. * sysdeps/powerpc/fpu/s_rintf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_rint.c: Likewise. * sysdeps/riscv/rvf/s_rintf.c: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_rint.c: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_rintf.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_rint.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_rintf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_rint.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_rintf.c: Likewise. * sysdeps/x86_64/fpu/math_private.h: Remove file. * math/e_scalb.c (invalid_fn): Use rint functions instead of __rint variants. * math/e_scalbf.c (invalid_fn): Likewise. * math/e_scalbl.c (invalid_fn): Likewise. * sysdeps/ieee754/dbl-64/e_gamma_r.c (__ieee754_gamma_r): Likewise. * sysdeps/ieee754/flt-32/e_gammaf_r.c (__ieee754_gammaf_r): Likewise. * sysdeps/ieee754/k_standard.c (__kernel_standard): Likewise. * sysdeps/ieee754/k_standardl.c (__kernel_standard_l): Likewise. * sysdeps/ieee754/ldbl-128/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-96/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/powerpc/powerpc32/fpu/s_llrint.c (__llrint): Likewise. * sysdeps/powerpc/powerpc32/fpu/s_llrintf.c (__llrintf): Likewise.
2018-09-14Use floor functions not __floor functions in glibc libm.Joseph Myers39-78/+53
Similar to the changes that were made to call sqrt functions directly in glibc, instead of __ieee754_sqrt variants, so that the compiler could inline them automatically without needing special inline definitions in lots of math_private.h headers, this patch makes libm code call floor functions directly instead of __floor variants, removing the inlines / macros for x86_64 (SSE4.1) and powerpc (POWER5). The redirection used to ensure that __ieee754_sqrt does still get called when the compiler doesn't inline a built-in function expansion is refactored so it can be applied to other functions; the refactoring is arranged so it's not limited to unary functions either (it would be reasonable to use this mechanism for copysign - removing the inline in math_private_calls.h but also eliminating unnecessary local PLT entry use in the cases (powerpc soft-float and e500v1, for IBM long double) where copysign calls don't get inlined). The point of this change is that more architectures can get floor calls inlined where they weren't previously (AArch64, for example), without needing special inline definitions in their math_private.h, and existing such definitions in math_private.h headers can be removed. Note that it's possible that in some cases an inline may be used where an IFUNC call was previously used - this is the case on x86_64, for example. I think the direct calls to floor are still appropriate; if there's any significant performance cost from inline SSE2 floor instead of an IFUNC call ending up with SSE4.1 floor, that indicates that either the function should be doing something else that's faster than using floor at all, or it should itself have IFUNC variants, or that the compiler choice of inlining for generic tuning should change to allow for the possibility that, by not inlining, an SSE4.1 IFUNC might be called at runtime - but not that glibc should avoid calling floor internally. (After all, all the same considerations would apply to any user program calling floor, where it might either be inlined or left as an out-of-line call allowing for a possible IFUNC.) Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT): New macro. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT_LDBL): Likewise. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT_F128): Likewise. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT_UNARY_ARGS): Likewise. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (sqrt): Redirect using MATH_REDIRECT. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (floor): Likewise. * sysdeps/aarch64/fpu/s_floor.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/aarch64/fpu/s_floorf.c: Likewise. * sysdeps/ieee754/dbl-64/s_floor.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_floor.c: Likewise. * sysdeps/ieee754/float128/s_floorf128.c: Likewise. * sysdeps/ieee754/flt-32/s_floorf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_floorl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_floorl.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_floor_template.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_floor.c: Likewise. * sysdeps/riscv/rvf/s_floorf.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_floor.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_floorf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_floor.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_floorf.c: Likewise. * sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__floor): Remove macro. [_ARCH_PWR5X] (__floorf): Likewise. * sysdeps/x86_64/fpu/math_private.h [__SSE4_1__] (__floor): Remove inline function. [__SSE4_1__] (__floorf): Likewise. * math/w_lgamma_main.c (LGFUNC (__lgamma)): Use floor functions instead of __floor variants. * math/w_lgamma_r_compat.c (__lgamma_r): Likewise. * math/w_lgammaf_main.c (LGFUNC (__lgammaf)): Likewise. * math/w_lgammaf_r_compat.c (__lgammaf_r): Likewise. * math/w_lgammal_main.c (LGFUNC (__lgammal)): Likewise. * math/w_lgammal_r_compat.c (__lgammal_r): Likewise. * math/w_tgamma_compat.c (__tgamma): Likewise. * math/w_tgamma_template.c (M_DECL_FUNC (__tgamma)): Likewise. * math/w_tgammaf_compat.c (__tgammaf): Likewise. * math/w_tgammal_compat.c (__tgammal): Likewise. * sysdeps/ieee754/dbl-64/e_lgamma_r.c (sin_pi): Likewise. * sysdeps/ieee754/dbl-64/k_rem_pio2.c (__kernel_rem_pio2): Likewise. * sysdeps/ieee754/dbl-64/lgamma_neg.c (__lgamma_neg): Likewise. * sysdeps/ieee754/flt-32/e_lgammaf_r.c (sin_pif): Likewise. * sysdeps/ieee754/flt-32/lgamma_negf.c (__lgamma_negf): Likewise. * sysdeps/ieee754/ldbl-128/e_lgammal_r.c (__ieee754_lgammal_r): Likewise. * sysdeps/ieee754/ldbl-128/e_powl.c (__ieee754_powl): Likewise. * sysdeps/ieee754/ldbl-128/lgamma_negl.c (__lgamma_negl): Likewise. * sysdeps/ieee754/ldbl-128/s_expm1l.c (__expm1l): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_lgammal_r.c (__ieee754_lgammal_r): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_powl.c (__ieee754_powl): Likewise. * sysdeps/ieee754/ldbl-128ibm/lgamma_negl.c (__lgamma_negl): Likewise. * sysdeps/ieee754/ldbl-128ibm/s_expm1l.c (__expm1l): Likewise. * sysdeps/ieee754/ldbl-128ibm/s_truncl.c (__truncl): Likewise. * sysdeps/ieee754/ldbl-96/e_lgammal_r.c (sin_pi): Likewise. * sysdeps/ieee754/ldbl-96/lgamma_negl.c (__lgamma_negl): Likewise. * sysdeps/powerpc/power5+/fpu/s_modf.c (__modf): Likewise. * sysdeps/powerpc/power5+/fpu/s_modff.c (__modff): Likewise.
2018-09-12Add new log2 implementationSzabolcs Nagy7-244/+363
Similar algorithm is used as in log: log2(2^k x) = k + log2(c) + log2(x/c) where the last term is approximated by a polynomial of x/c - 1, the first order coefficient is about 1/ln2 in this case. There is separate code path when fma instruction is not available for computing x/c - 1 precisely, for which the table size is doubled. The worst case error is 0.547 ULP (0.55 without fma), the read only global data size is 1168 bytes (2192 without fma) on aarch64. The non-nearest rounding error is less than 1 ULP. Improvements on Cortex-A72 compared to current glibc master: log2 thruput: 2.00x in [0.01 11.1] log2 latency: 2.04x in [0.01 11.1] log2 thruput: 2.17x in [0.999 1.001] log2 latency: 2.88x in [0.999 1.001] Tested on aarch64-linux-gnu (defined __FP_FAST_FMA) arm-linux-gnueabihf (!defined __FP_FAST_FMA) x86_64-linux-gnu (!defined __FP_FAST_FMA) powerpc64le-linxu-gnu (defined __FP_FAST_FMA) targets. * NEWS: Mention log2 improvements. * math/Makefile (type-double-routines): Add e_log2_data. * sysdeps/i386/fpu/e_log2_data.c: New file. * sysdeps/ia64/fpu/e_log2_data.c: New file. * sysdeps/ieee754/dbl-64/e_log2.c: Rewrite. * sysdeps/ieee754/dbl-64/e_log2_data.c: New file. * sysdeps/ieee754/dbl-64/math_config.h (__log2_data): Add. * sysdeps/ieee754/dbl-64/wordsize-64/e_log2.c: Remove. * sysdeps/m68k/m680x0/fpu/e_log2_data.c: New file.
2018-09-12Add new log implementationSzabolcs Nagy8-3565/+477
Optimized log using carefully generated lookup table with 1/c and log(c) values for small intervalls around 1. The log(c) is very near a double precision value, it has about 62 bits precision. The algorithm is log(2^k x) = k log(2) + log(c) + log(x/c), where the last term is approximated by a polynomial of x/c - 1. Near 1 a single polynomial of x - 1 is used. There is separate code path when fma instruction is not available for computing x/c - 1 precisely, in which case the table size is doubled. The code uses __builtin_fma under __FP_FAST_FMA to ensure it is inlined as an instruction. With the default configuration settings the worst case error is 0.519 ULP (and 0.520 without fma), the rodata size is 2192 bytes (4240 without fma). The non-nearest rounding error is less than 1 ULP. Improvements on Cortex-A72 compared to current glibc master: log thruput: 3.28x in [0.01 11.1] log latency: 2.23x in [0.01 11.1] log thruput: 1.56x in [0.999 1.001] log latency: 1.57x in [0.999 1.001] Tested on aarch64-linux-gnu (defined __FP_FAST_FMA) arm-linux-gnueabihf (!defined __FP_FAST_FMA) x86_64-linux-gnu (!defined __FP_FAST_FMA) powerpc64le-linux-gnu (defined __FP_FAST_FMA) targets. * NEWS: Mention log improvement. * math/Makefile (type-double-routines): Add e_log_data. * sysdeps/i386/fpu/e_log_data.c: New file. * sysdeps/ia64/fpu/e_log_data.c: New file. * sysdeps/ieee754/dbl-64/e_log.c: Rewrite. * sysdeps/ieee754/dbl-64/e_log_data.c: New file. * sysdeps/ieee754/dbl-64/math_config.h (__log_data): Add. * sysdeps/ieee754/dbl-64/ulog.h: Remove. * sysdeps/ieee754/dbl-64/ulog.tbl: Remove. * sysdeps/m68k/m680x0/fpu/e_log_data.c: New file.
2018-09-12i386: Use ENTRY and END in start.S [BZ #23606]H.J. Lu1-4/+6
Wrapping the _start function with ENTRY and END to insert ENDBR32 at function entry when CET is enabled. Since _start now includes CFI, without "cfi_undefined (eip)", unwinder may not terminate at _start and we will get Program received signal SIGSEGV, Segmentation fault. 0xf7dc661e in ?? () from /lib/libgcc_s.so.1 Missing separate debuginfos, use: dnf debuginfo-install libgcc-8.2.1-3.0.fc28.i686 (gdb) bt #0 0xf7dc661e in ?? () from /lib/libgcc_s.so.1 #1 0xf7dc7c18 in _Unwind_Backtrace () from /lib/libgcc_s.so.1 #2 0xf7f0d809 in __GI___backtrace (array=array@entry=0xffffc7d0, size=size@entry=20) at ../sysdeps/i386/backtrace.c:127 #3 0x08049254 in compare (p1=p1@entry=0xffffcad0, p2=p2@entry=0xffffcad4) at backtrace-tst.c:12 #4 0xf7e2a28c in msort_with_tmp (p=p@entry=0xffffca5c, b=b@entry=0xffffcad0, n=n@entry=2) at msort.c:65 #5 0xf7e29f64 in msort_with_tmp (n=2, b=0xffffcad0, p=0xffffca5c) at msort.c:53 #6 msort_with_tmp (p=p@entry=0xffffca5c, b=b@entry=0xffffcad0, n=n@entry=5) at msort.c:53 #7 0xf7e29f64 in msort_with_tmp (n=5, b=0xffffcad0, p=0xffffca5c) at msort.c:53 #8 msort_with_tmp (p=p@entry=0xffffca5c, b=b@entry=0xffffcad0, n=n@entry=10) at msort.c:53 #9 0xf7e29f64 in msort_with_tmp (n=10, b=0xffffcad0, p=0xffffca5c) at msort.c:53 #10 msort_with_tmp (p=p@entry=0xffffca5c, b=b@entry=0xffffcad0, n=n@entry=20) at msort.c:53 #11 0xf7e2a5b6 in msort_with_tmp (n=20, b=0xffffcad0, p=0xffffca5c) at msort.c:297 #12 __GI___qsort_r (b=b@entry=0xffffcad0, n=n@entry=20, s=s@entry=4, cmp=cmp@entry=0x8049230 <compare>, arg=arg@entry=0x0) at msort.c:297 #13 0xf7e2a84d in __GI_qsort (b=b@entry=0xffffcad0, n=n@entry=20, s=s@entry=4, cmp=cmp@entry=0x8049230 <compare>) at msort.c:308 #14 0x080490f6 in main (argc=2, argv=0xffffcbd4) at backtrace-tst.c:39 FAIL: debug/backtrace-tst [BZ #23606] * sysdeps/i386/start.S: Include <sysdep.h> (_start): Use ENTRY/END to insert ENDBR32 at entry when CET is enabled. Add cfi_undefined (eip). Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2018-09-11Remove x86_64 math_private.h asms.Joseph Myers1-44/+0
The x86_64 math_private.h has asm versions of the macros to reinterpret between floating-point and integer types. This is the sort of thing we now strongly discourage; the expectation in such cases, where the generic C code gives the compiler all the information needed about the required semantics, is that you should get the compiler to do the right thing for the generic C code rather than writing an asm version. Trivial tests showed GCC generates the expected single instructions for reinterpretation from floating point to integer. In the other direction, it goes via memory when the asms don't; I asked about this in GCC bug 87236 and was advised this was deliberate for generic tuning because it was faster that way on some AMD processors (but -mtune=intel, and -Os with the latest GCC, avoid going via memory). The asms don't and can't know about those tuning details, so that's evidence that they are actually making the code worse. This patch removes the asms accordingly. Tested for x86_64. * sysdeps/x86_64/fpu/math_private.h (MOVD): Remove macro. (MOVQ): Likewise. (EXTRACT_WORDS64): Likewise. (INSERT_WORDS64): Likewise. (GET_FLOAT_WORD): Likewise. (SET_FLOAT_WORD): Likewise.
2018-09-06S390: Regenerate ULPs.Stefan Liebler1-68/+64
Regenerated ulps from scratch after recent changes. ChangeLog: * sysdeps/s390/fpu/libm-test-ulps: Regenerated.
2018-09-06Fix segfault in maybe_script_execute.Stefan Liebler1-1/+1
If glibc is built with gcc 8 and -march=z900, the testcase posix/tst-spawn4-compat crashes with a segfault. In function maybe_script_execute, the new_argv array is dynamically initialized on stack with (argc + 1) elements. The function wants to add _PATH_BSHELL as the first argument and writes out of bounds of new_argv. There is an off-by-one because maybe_script_execute fails to count the terminating NULL when sizing new_argv. ChangeLog: * sysdeps/unix/sysv/linux/spawni.c (maybe_script_execute): Increment size of new_argv by one.
2018-09-05Add new exp and exp2 implementationsSzabolcs Nagy24-3669/+882
Optimized exp and exp2 implementations using a lookup table for fractional powers of 2. There are several variants, see e_exp_data.c, they can be selected by modifying math_config.h allowing different tradeoffs. The default selection should be acceptable as generic libm code. Worst case error is 0.509 ULP for exp and 0.507 ULP for exp2, on aarch64 the rodata size is 2160 bytes, shared between exp and exp2. On aarch64 .text + .rodata size decreased by 24912 bytes. The non-nearest rounding error is less than 1 ULP even on targets without efficient round implementation (although the error rate is higher in that case). Targets with single instruction, rounding mode independent, to nearest integer rounding and conversion can use them by setting TOINT_INTRINSICS and adding the necessary code to their math_private.h. The __exp1 code uses the same algorithm, so the error bound of pow increased a bit. New double precision error handling code was added following the style of the single precision error handling code. Improvements on Cortex-A72 compared to current glibc master: exp thruput: 1.61x in [-9.9 9.9] exp latency: 1.53x in [-9.9 9.9] exp thruput: 1.13x in [0.5 1] exp latency: 1.30x in [0.5 1] exp2 thruput: 2.03x in [-9.9 9.9] exp2 latency: 1.64x in [-9.9 9.9] For small (< 1) inputs the current exp code uses a separate algorithm so the speed up there is less. Was tested on aarch64-linux-gnu (TOINT_INTRINSICS, fma contraction) and arm-linux-gnueabihf (!TOINT_INTRINSICS, no fma contraction) and x86_64-linux-gnu (!TOINT_INTRINSICS, no fma contraction) and powerpc64le-linux-gnu (!TOINT_INTRINSICS, fma contraction) targets, only non-nearest rounding ulp errors increase and they are within acceptable bounds (ulp updates are in separate patches). * NEWS: Mention exp and exp2 improvements. * math/Makefile (libm-support): Remove t_exp. (type-double-routines): Add math_err and e_exp_data. * sysdeps/aarch64/libm-test-ulps: Update. * sysdeps/arm/libm-test-ulps: Update. * sysdeps/i386/fpu/e_exp_data.c: New file. * sysdeps/i386/fpu/math_err.c: New file. * sysdeps/i386/fpu/t_exp.c: Remove. * sysdeps/ia64/fpu/e_exp_data.c: New file. * sysdeps/ia64/fpu/math_err.c: New file. * sysdeps/ia64/fpu/t_exp.c: Remove. * sysdeps/ieee754/dbl-64/e_exp.c: Rewrite. * sysdeps/ieee754/dbl-64/e_exp2.c: Rewrite. * sysdeps/ieee754/dbl-64/e_exp_data.c: New file. * sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Update error bound. * sysdeps/ieee754/dbl-64/eexp.tbl: Remove. * sysdeps/ieee754/dbl-64/math_config.h: New file. * sysdeps/ieee754/dbl-64/math_err.c: New file. * sysdeps/ieee754/dbl-64/t_exp.c: Remove. * sysdeps/ieee754/dbl-64/t_exp2.h: Remove. * sysdeps/ieee754/dbl-64/uexp.h: Remove. * sysdeps/ieee754/dbl-64/uexp.tbl: Remove. * sysdeps/m68k/m680x0/fpu/e_exp_data.c: New file. * sysdeps/m68k/m680x0/fpu/math_err.c: New file. * sysdeps/m68k/m680x0/fpu/t_exp.c: Remove. * sysdeps/powerpc/fpu/libm-test-ulps: Update. * sysdeps/x86_64/fpu/libm-test-ulps: Update.