aboutsummaryrefslogtreecommitdiff
path: root/sysdeps
AgeCommit message (Collapse)AuthorFilesLines
2021-08-27math/test-sinl-pseudo: Use stack protector only if availableFlorian Weimer1-0/+2
This fixes commit 9333498794cde1d5cca518bad ("Avoid ldbl-96 stack corruption from range reduction of pseudo-zero (bug 25487).").
2021-08-27Avoid ldbl-96 stack corruption from range reduction of pseudo-zero (bug 25487).Joseph Myers3-1/+55
Bug 25487 reports stack corruption in ldbl-96 sinl on a pseudo-zero argument (an representation where all the significand bits, including the explicit high bit, are zero, but the exponent is not zero, which is not a valid representation for the long double type). Although this is not a valid long double representation, existing practice in this area (see bug 4586, originally marked invalid but subsequently fixed) is that we still seek to avoid invalid memory accesses as a result, in case of programs that treat arbitrary binary data as long double representations, although the invalid representations of the ldbl-96 format do not need to be consistently handled the same as any particular valid representation. This patch makes the range reduction detect pseudo-zero and unnormal representations that would otherwise go to __kernel_rem_pio2, and returns a NaN for them instead of continuing with the range reduction process. (Pseudo-zero and unnormal representations whose unbiased exponent is less than -1 have already been safely returned from the function before this point without going through the rest of range reduction.) Pseudo-zero representations would previously result in the value passed to __kernel_rem_pio2 being all-zero, which is definitely unsafe; unnormal representations would previously result in a value passed whose high bit is zero, which might well be unsafe since that is not a form of input expected by __kernel_rem_pio2. Tested for x86_64.
2021-08-27x86_64: Remove unneeded static PIE check for undefined weak diagnosticFangrui Song2-58/+0
https://sourceware.org/bugzilla/show_bug.cgi?id=21782 dropped an ld diagnostic for R_X86_64_PC32 referencing an undefined weak symbol in -pie links. Arguably keeping the diagnostic like other ports is more correct, since statically resolving movl foo(%rip), %eax to the link-time zero address produces a corrupted output. It turns out that --enable-static-pie builds do not depend on the ld behavior. GCC generates GOT indirection for weak declarations for -fPIE/-fPIC, so what ld does with the PC-relative relocation doesn't really matter. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-08-27[PATCH 7/7] sin/cos slow paths: refactor sincos implementationWilco Dijkstra2-45/+52
Refactor the sincos implementation - rather than rely on odd partial inlining of preprocessed portions from sin and cos, explicitly write out the cases. This makes sincos much easier to maintain and provides an additional 16-20% speedup between 0 and 2^27. The overall speedup of sincos is 48% over this range. Between 0 and PI it is 66% faster. * sysdeps/ieee754/dbl-64/s_sin.c (__sin): Cleanup ifdefs. (__cos): Likewise. * sysdeps/ieee754/dbl-64/s_sin.c (__sincos): Refactor using the same logic as sin and cos.
2021-08-27[PATCH 6/7] sin/cos slow paths: refactor duplicated code into dosinWilco Dijkstra1-27/+13
Refactor duplicated code into do_sin. Since all calls to do_sin use copysign to set the sign of the result, move it inside do_sin. Small inputs use a separate polynomial, so move this into do_sin as well (the check is based on the more conservative case when doing large range reduction, but could be relaxed). * sysdeps/ieee754/dbl-64/s_sin.c (do_sin): Use TAYLOR_SIN for small inputs. Return correct sign. (do_sincos): Remove small input check before do_sin, let do_sin set the sign. (__sin): Likewise. (__cos): Likewise.
2021-08-27[PATCH 5/7] sin/cos slow paths: remove unused slowpath functionsWilco Dijkstra1-444/+3
Remove all unused slowpath functions. * sysdeps/ieee754/dbl-64/s_sin.c (TAYLOR_SLOW): Remove. (do_cos_slow): Likewise. (do_sin_slow): Likewise. (reduce_and_compute): Likewise. (slow): Likewise. (slow1): Likewise. (slow2): Likewise. (sloww): Likewise. (sloww1): Likewise. (sloww2): Likewise. (bslow): Likewise. (bslow1): Likewise. (bslow2): Likewise. (cslow2): Likewise.
2021-08-27[PATCH 4/7] sin/cos slow paths: remove slow paths from huge range reductionWilco Dijkstra2-64/+34
For huge inputs use the improved do_sincos function as well. Now no cases use the correction factor returned by do_sin, do_cos and TAYLOR_SIN, so remove it. * sysdeps/ieee754/dbl-64/s_sin.c (TAYLOR_SIN): Remove cor parameter. (do_cos): Remove corp parameter and calculations. (do_sin): Likewise. (do_sincos): Remove cor variable. (__sin): Use do_sincos for huge inputs. (__cos): Likewise. * sysdeps/ieee754/dbl-64/s_sincos.c (__sincos): Likewise. (reduce_and_compute_sincos): Remove unused function.
2021-08-27[PATCH 3/7] sin/cos slow paths: remove slow paths from small range reductionWilco Dijkstra2-53/+47
This patch improves the accuracy of the range reduction. When the input is large (2^27) and very close to a multiple of PI/2, using 110 bits of PI is not enough. Improve range reduction accuracy to 136 bits. As a result the special checks for results close to zero can be removed. The ULP of the polynomials is at worst 0.55ULP, so there is no reason for the slow functions, and they can be removed. * sysdeps/ieee754/dbl-64/s_sin.c (reduce_sincos_1): Rename to reduce_sincos, improve accuracy to 136 bits. (do_sincos_1): Rename to do_sincos, remove fallbacks to slow functions. (__sin): Use improved reduction and simplified do_sincos calculation. (__cos): Likewise. * sysdeps/ieee754/dbl-64/s_sincos.c (__sincos): Likewise.
2021-08-27[PATCH 2/7] sin/cos slow paths: remove large range reductionWilco Dijkstra2-103/+2
This patch removes the large range reduction code and defers to the huge range reduction code. The first level range reducer supports inputs up to 2^27, which is way too large given that inputs for sin/cos are typically small (< 10), and optimizing for a smaller range would give a significant speedup. Input values above 2^27 are practically never used, so there is no reason for supporting range reduction between 2^27 and 2^48. Removing it significantly simplifies code and enables further speedups. There is about a 2.3x slowdown in this range due to __branred being extremely slow (a better algorithm could easily more than double performance). * sysdeps/ieee754/dbl-64/s_sin.c (reduce_sincos_2): Remove function. (do_sincos_2): Likewise. (__sin): Remove middle range reduction case. (__cos): Likewise. * sysdeps/ieee754/dbl-64/s_sincos.c (__sincos): Remove middle range reduction case.
2021-08-27[PATCH 1/7] sin/cos slow paths: avoid slow paths for small inputsWilco Dijkstra3-26/+26
This series of patches removes the slow patchs from sin, cos and sincos. Besides greatly simplifying the implementation, the new version is also much faster for inputs up to PI (41% faster) and for large inputs needing range reduction (27% faster). ULP is ~0.55 with no errors found after testing 1.6 billion inputs across most of the range with mpsin and mpcos. The number of incorrectly rounded results (ie. ULP >0.5) is at most ~2750 per million inputs between 0.125 and 0.5, the average is ~850 per million between 0 and PI. Tested on AArch64 and x86_64 with no regressions. The first patch removes the slow paths for the cases where the input is small and doesn't require range reduction. Update ULP tables for sin, cos and sincos on AArch64 and x86_64. * sysdeps/aarch64/libm-test-ulps: Update ULP for sin, cos, sincos. * sysdeps/ieee754/dbl-64/s_sin.c (__sin): Remove slow paths for small inputs. (__cos): Likewise. * sysdeps/x86_64/fpu/libm-test-ulps: Update ULP for sin, cos, sincos.
2021-08-27Let time and gettimeofday use vdso by removing old clang workaroundStan Shebs2-8/+2
2021-08-27Do not use ppc-specific long double pack/unpack when compiling with clangStan Shebs1-0/+5
2021-08-27Remove old workaround in power7 logb functions, clang no longer crashes on ↵Stan Shebs3-24/+0
the inline assembly
2021-08-27Additional fixes for llvm-asJosh Kunz2-2/+2
Unlike GCC, llvm always uses an integrated assembler, which attempts to recognized all `asm` statements written in the C code. glibc uses some syntactically invalid asm statements to emit constants into assembly that are later extracted with a sed or AWK script. This change fixes two such invalid `asm` statements by wrapping the output in a `.ascii` directive.. This does not break the sed/AWK (the same special sequence is output) but it makes the statement syntactically valid. See cf8e3f8757 for a previous fix for the same issue.
2021-08-27Add workaround for infinite looping in ppc vsyscall for sched_getcpu.Stan Shebs1-0/+17
2021-08-27Add an LD_DEBUG=tls option to help debug thread-local storage handling in ld.soStan Shebs1-0/+5
2021-08-27Make multi-arch ifunc support work with clangStan Shebs2-12/+18
2021-08-27Redesign the fastload support for additional performanceAmbrose Feinstein1-9/+15
2021-08-27Fix sense of a test in the static-linking version of ppc get_clockfreqStan Shebs1-1/+1
2021-08-27Makes it compile for AArch64Shu-Chun Weng1-2/+14
De-nesting fix in 83c02e85 changed function signature but AArch64 was untested.
2021-08-27Makes AArch64 assembly acceptable to clangShu-Chun Weng4-14/+14
According to ARMv8 architecture reference manual section C7.2.188, SIMD MOV (to general) instruction format is MOV <Xd>, <Vn>.D[<index>] gas appears to accept "<Vn>.2D[<index>]" as well, but clang's assembler does not. C.f. https://community.arm.com/developer/ip-products/processors/f/cortex-a-forum/5214/aarch64-assembly-syntax-for-armclang
2021-08-27Include STATIC_PIE_BOOTSTRAP with !NESTING in powerpc64/dl-machine.hSiva Chandra Reddy1-1/+1
2021-08-27Enable relaxed relocations when building certain object files for x86_64.Siva Chandra Reddy1-0/+3
2021-08-27Un-nest an include in dl-reloc-static-pie.c.Siva Chandra Reddy1-1/+1
A corresponding adjustment in sysdeps/x86_64/dl-machine.h has also been made.
2021-08-27Disable -mfloat128 for clang, lets power9 insns into power8 executablesStan Shebs1-39/+41
2021-08-27Also work around clang bctrl issue in get_clockfreq.cStan Shebs1-0/+18
2021-08-27Changes to compile glibc-2.27 on PPC (Power8) with clang.Raman Tenneti5-0/+44
+ Use DOT_MACHINE macro instead of ".machine" instruction. + Use __isinf and __isinff instead of builtin versions. + In s_logb, s_logbf and s_logbl functions, used float versions to calculate "ret = x & 0x7f800000;" expression.
2021-08-27Undid the dl_enable_fastload environment variable changes.Raman Tenneti1-1/+0
2021-08-27Add "fastload" support.Paul Pluzhnikov1-1/+66
2021-08-27Work around lack of mfppr in clangStan Shebs1-0/+5
2021-08-27Work around mtfsb0 syntax limitation with clangStan Shebs2-0/+16
2021-08-27Avoid passing gcc-specific options to clangStan Shebs1-0/+5
2021-08-27Make asm-based constraints be gcc-onlyStan Shebs1-0/+2
2021-08-27Make xxland syntax gcc-onlyStan Shebs1-4/+4
2021-08-27Add a first approximation of float definitions for ppc clangStan Shebs1-3/+47
2021-08-27Make powerpc .machine directives be gcc-onlyStan Shebs4-3/+12
2021-08-27Make mutex hints gcc-only, improve a type in ↵Stan Shebs1-2/+2
__arch_compare_and_exchange_bool_32_acq
2021-08-27Make power6 directives be gcc-onlyStan Shebs2-3/+16
2021-08-27Add power9 flag to go with -mfloat128Stan Shebs1-37/+43
2021-08-27Disable more attempts to pass -mlong-double-128 to clangStan Shebs4-0/+14
2021-08-27Disable attempts to pass -mlong-double-128 to clangStan Shebs5-0/+112
2021-08-27Add workaround for infinite looping in ppc vsyscallsStan Shebs2-0/+34
2021-08-27Work around clang crash by skipping apparently-unneeded asmStan Shebs1-0/+4
2021-08-27Work around clang problem with ifuncs and vdsoStan Shebs2-2/+2
2021-08-27Work around a ppc clang inlining bugStan Shebs2-1/+10
2021-08-27Change de-nesting fix to use added argument instead of globalsStan Shebs2-2/+18
2021-08-27Fix regressions in async-safe TLS, add run-time control for debugging, add ↵Stan Shebs1-0/+3
more comments
2021-08-27Fix TLS problems not handled by cherrypickStan Shebs3-7/+5
2021-08-27Revert upstream removal of async-safe TLS patches.Brooks Moses4-0/+73
2021-08-27Don't write beyond destination in __mempcpy_avx512_no_vzeroupper (bug 23196)Andreas Schwab1-2/+3
When compiled as mempcpy, the return value is the end of the destination buffer, thus it cannot be used to refer to the start of it. (cherry picked from commit 9aaaab7c6e4176e61c59b0a63c6ba906d875dc0e)