Age | Commit message (Collapse) | Author | Files | Lines |
|
These changes were mistakenly left out of the patches that added SIMD
versions of these functions to libmvec.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
|
|
Based off the ./sysdeps/ieee754/dbl-64/pow.c implementation,
and provides identical results.
Unlike other libmvec functions, this sets the underflow and overflow bits.
The caller can check these flags, and possibly re-run the calculations with
scalar pow to figure out what is causing the overflow or underflow.
I may have not normalized the data for benchmarking this properly,
but operating only on integers between 0-2^32 and floats between 0.5 and
1 I get the following:
Running 20 times over 32MiB
vector: mean 535.824919 (sd 0.246088)
scalar: mean 286.384220 (sd 0.027630)
Which is a very impressive speed boost.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
|
|
Based off the ./sysdeps/ieee754/flt-32/powf.c implementation,
and thus provides identical results.
Unlike other libmvec functions, this sets the underflow and overflow bits.
The caller can check these flags, and possibly re-run the calculations with
scalar powf to figure out what is causing the overflow or underflow.
I may have not normalized the data for benchmarking this properly,
but operating only on floats between 0.5 and 1 I get the following:
Running 20 times over 32MiB
vector: mean 307.659767 (sd 0.203217)
scalar: mean 221.837088 (sd 0.032256)
And with random data there is a decrease in performance:
vector: mean 265.366371 (sd 0.000626)
scalar: mean 279.598078 (sd 0.025592)
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
|
|
vec_cmpne was added to GCC 7, requiring an alternative implementation
when building glibc with GCC 6.
|
|
Passes all tests.
Unlike other libmvec functions, this sets the underflow and overflow bits.
The caller can check these flags, and possibly re-run the calculations with
scalar expf to figure out what is causing the overflow or underflow.
The special-case path is not vectorized, and performs much worse than
the scalar code.
Normalized data: 1 to 2^32 converted to double
Running 20 times over 32MiB
vector: mean 563.807107 MiB/s (sd 0.390922)
scalar: mean 226.527824 MiB/s (sd 0.077406)
Random data:
vector: mean 80.175986 MiB/s (sd 1.110948)
scalar: mean 244.738130 MiB/s (sd 0.029561)
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
|
|
Passes all tests.
Based off the ./sysdeps/ieee754/dbl-64/e_exp.c implementation,
and thus provides identical results.
Unlike other libmvec functions, this sets the underflow and overflow bits.
The caller can check these flags, and possibly re-run the calculations with
scalar expf to figure out what is causing the overflow or underflow.
Suprisingly the special-case path performs as well as the normal path.
(both of which are vectorized)
Running 20 times over 32MiB
vector: mean 432.263032 MiB/s (sd 0.486733)
scalar: mean 178.646197 MiB/s (sd 0.050013)
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
|
|
The built-in vec_float was added to GCC 8.0, requiring an alternative
implementation when using older GCC versions.
|
|
Implements single-precision vector logarithm function. The algorithm is
an adaptation of the one in sysdeps/ieee754/flt-32/e_logf.c, modified for
PPC64 VSX hardware. The version of e_logf.c referenced here is from
commit #bf27d3973d.
The patch has been tested on both Little-Endian and Big-Endian. It
passes all the tests for single-precision logarithm run by make check with
max ULP of 1. Integration into the make check infrastructure is adapted from
similar x86_64 changes in commit #774488f88a.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
|
|
Implements double-precision vector logarithm function. The algorithm is
an adaptation of the one in sysdeps/ieee754/dbl-64, modified to exploit
PPC64 VSX hardware. The version of ieee754/dbl-64 is commit #f41b0a43e4.
The patch has been tested on both Little-Endian and Big-Endian. It
passes all the tests for double-precision logarithm run by make check.
Integration into the make check infrastructure closely follows corres-
ponding changes done for x86_64 in commit #6af25acc7b.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
|
|
vec_d_cos2_vsx.c, vec_d_sin2_vsx.c and vec_d_sincos2_vsx.c use
vec_sl(), which is only available on POWER8 processors.
|
|
Implements single-precision vector sincosf function. The polynomial approxima-
ting algorithm is adapted for PPC64 from x86_64 [commit #a6336cc446].
The patch has been tested on PPC64/POWER8 Little Endian and Big Endian.
Testing uses the framework created for libmvec on x86_64 which runs tests on
issuing 'make check'. Tests of the new vector sincosf function all pass.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
|
|
Implements double-precision vector sincos function. The polynomial approxima-
ting algorithm is adapted for PPC64 from x86_64 [commit #c9a8c526ac].
The patch has been tested on PPC64/POWER8 Little Endian and Big Endian.
Testing uses the framework created for libmvec on x86_64 which runs tests on
issuing 'make check'. Tests of the new vector sincos function all pass.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
|
|
Implements single-precision vector sine function. The polynomial
sine-approximating algorithm is adapted for PPC64 from x86_64 [commit #2a8c2c7b33].
The patch has been tested on PPC64/POWER8 Little Endian and Big Endian.
Testing uses the framework created for libmvec on x86_64 which runs tests on
issuing 'make check'. Tests of the new vector single-precision sine function all pass.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
|
|
Implements double-precision vector sine function. The polynomial
sine-approximating algorithm is adapted for PPC64 from x86_64 [commit #4b9c2b707b].
The patch has been tested on PPC64/POWER8 Little Endian and Big Endian.
Testing uses the framework created for libmvec on x86_64 which runs tests on
issuing 'make check'. Tests of the new vector sine function all pass.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
|
|
Implements single-precision cosine using VSX vector capability. The polynomial
cosine-approximating algorithm is adapted for PPC64 from x86_64 [commit #04f496d602].
The patch has been tested on PPC64/POWER8 Little Endian and Big Endian. It is
tested using the framework created for libmvec on x86_64 which runs tests on
issuing 'make check'. Tests of the new vector cosine function all pass.
Details on the ABI are found at this link:
<https://sourceware.org/glibc/wiki/
libmvec?action=AttachFile&do=view&target=VectorABI.txt>
But for adjusting the width of operands, details described for the
double-precision cosine implemented earlier apply here. See git
commit #7956c29f07 for that information.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
|
|
This is the 1st of 12 patches that will implement libmvec for PPC64 using
VSX hardware capabilities.
Implements double-precision cosine using VSX vector capability. Algorithm for
cosine is from x86_64 [commit #2193311288] adapted to PPC64.
Name-mangling exactly duplicates SSE ISA of the x86_64 ABI. The details are at
<https://sourceware.org/glibc/wiki/
libmvec?action=AttachFile&do=view&target=VectorABI.txt>
The patch has been tested on PPC64/POWER8 Little Endian and Big Endian. It is
tested using the framework created for libmvec on x86_64 which runs tests on
issuing 'make check'. Tests of the new vector cosine function all pass.
Library libmvec is built by default. To disable building it, pass flag
--disable-mathvec to the configure script.
A runtime check prevents vector tests running on systems lacking VSX hardware.
Glibc built with this patch was installed using the procedure outlined at
<https://sourceware.org/glibc/wiki/Testing/Builds>. Compiling against the new
library created a test executable which computes cosines using the vector
version of the function. The results are at most 2-ulps away from the scalar
cosine. That is expected and indicated in the comments describing the
algorithm - as obtained from x86_64 commit #2193311288.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
|
|
From the GNU C Library manual, the pkey_set can receive a combination of
PKEY_DISABLE_WRITE and PKEY_DISABLE_ACCESS. However PKEY_DISABLE_ACCESS
is more restrictive than PKEY_DISABLE_WRITE and includes its behavior.
The test expects that after setting
(PKEY_DISABLE_WRITE|PKEY_DISABLE_ACCESS) pkey_get should return the
same. This may not be true as PKEY_DISABLE_ACCESS will succeed in
describing the state of the key in this case.
The pkey behavior during signal handling is different between x86 and
POWER. This change make the test compatible with both architectures.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
|
|
In the glibc the gettimeofday can use vDSO (on power and x86 the
USE_IFUNC_GETTIMEOFDAY is defined), gettimeofday syscall or 'default'
___gettimeofday() from ./time/gettime.c (as a fallback).
In this patch the last function (___gettimeofday) has been refactored and
moved to ./sysdeps/unix/sysv/linux/gettimeofday.c to be Linux specific.
The new __gettimeofday64 explicit 64 bit function for getting 64 bit time from
the kernel (by internally calling __clock_gettime64) has been introduced.
Moreover, a 32 bit version - __gettimeofday has been refactored to internally
use __gettimeofday64.
The __gettimeofday is now supposed to be used on systems still supporting 32
bit time (__TIMESIZE != 64) - hence the necessary check for time_t potential
overflow and conversion of struct __timeval64 to 32 bit struct timespec.
The iFUNC vDSO direct call optimization has been removed from both i686 and
powerpc32 (USE_IFUNC_GETTIMEOFDAY is not defined for those architectures
anymore). The Linux kernel does not provide a y2038 safe implementation of
gettimeofday neither it plans to provide it in the future, clock_gettime64
should be used instead. Keeping support for this optimization would require
to handle another build permutation (!__ASSUME_TIME64_SYSCALLS &&
USE_IFUNC_GETTIMEOFDAY) which adds more complexity and has limited use
(since the idea is to eventually have a y2038 safe glibc build).
Build tests:
./src/scripts/build-many-glibcs.py glibcs
Run-time tests:
- Run specific tests on ARM/x86 32bit systems (qemu):
https://github.com/lmajewski/meta-y2038 and run tests:
https://github.com/lmajewski/y2038-tests/commits/master
Above tests were performed with Y2038 redirection applied as well as without
to test proper usage of both __gettimeofday64 and __gettimeofday.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
[Including some commit message improvement]
|
|
It appears that the ability to change symbolic link modes through such
paths is unintended. On several file systems, the operation fails with
EOPNOTSUPP, even though the symbolic link permissions are updated.
The expected behavior is a failure to update the permissions, without
file system changes.
Reviewed-by: Matheus Castanho <msc@linux.ibm.com>
|
|
This supersedes the init_array sysdeps directory. It allows us to
check for ELF_INITFINI in both C and assembler code, and skip DT_INIT
and DT_FINI processing completely on newer architectures.
A new header file is needed because <dl-machine.h> is incompatible
with assembler code. <sysdep.h> is compatible with assembler code,
but it cannot be included in all assembler files because on some
architectures, it redefines register names, and some assembler files
conflict with that.
<elf-initfini.h> is replicated for legacy architectures which need
DT_INIT/DT_FINI support. New architectures follow the generic default
and disable it.
|
|
MIPS fallback code handle a frame where its FDE can not be obtained
(for instance a signal frame) by reading the kernel allocated signal frame
and adding '2' to the value of 'sc_pc' [1]. The added value is used to
recognize an end of an EH region on mips16 [2].
The fix adjust the obtained signal frame value and remove the libgcc added
value by checking if the previous frame is a signal frame one.
Checked with backtrace and tst-sigcontext-get_pc tests on mips-linux-gnu
and mips64-linux-gnu.
[1] libgcc/config/mips/linux-unwind.h from gcc code.
[2] gcc/config/mips/mips.h from gcc code. */
|
|
file_change_detection_for_stat partially initialize
struct file_change_detection in some cases, when the size member
alone determines the outcome of all comparisons. This results
in maybe-uninitialized compiler warnings in case of sufficiently
aggressive inlining.
Once the implementation is moved into a separate C file, this kind
of inlining is no longer possible, so the compiler warnings are gone.
|
|
The new type struct fd_to_filename makes the allocation of the
backing storage explicit.
Hurd uses /dev/fd, not /proc/self/fd.
Co-Authored-By: Paul Eggert <eggert@cs.ucla.edu>
|
|
All functions that have a format string, which can consume a long double
argument, must have one version for each long double format supported on
a platform. On powerpc64le, these functions currently have two versions
(i.e.: long double with the same format as double, and long double with
IBM Extended Precision format). Support for a third long double format
option (i.e. long double with IEEE long double format) is being prepared
and all the aforementioned functions now have a third version (not yet
exported on the master branch, but the code is in).
For these functions to get selected (during build time), references to
them in user programs (or dependent libraries) must get redirected to
the aforementioned new versions of the functions. This patch installs
the header magic required to perform such redirections.
Notice, however, that since the redirections only happen when
__LONG_DOUBLE_USES_FLOAT128 is set to 1, and no platform (including
powerpc64le) currently does it, no redirections actually happen.
Redirections and the exporting of the new functions will happen at the
same time (when powerpc64le adds ldbl-128ibm-compat to their Implies.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Reviewed-by: Paul E. Murphy <murphyp@linux.vnet.ibm.com>
|
|
Such constants are used in __USE_EXTERN_INLINES blocks.
|
|
The namespace pollution results in conform test failures if the tests
are run __USE_EXTERN_INLINES defined (e.g., when configuring with
CC="gcc -O3" CXX="g++ -O3").
|
|
Older GCC versions do not support this extension. Fixes commit f1bdee61797
("x86 tls: Use _Static_assert for TLS access size assertion").
|
|
|
|
|
|
NPTL's pthreadP.h needs internal definitions
|
|
tst-robust8.c prints some mutex internals for nptl debugging, this
needed to be made conditioned by getting built with nptl.
|
|
It actually is implemented.
|
|
|
|
htl has been widely tested for a long time now with this coherency
checked successfully.
|
|
Store them in the TCB, and use them for accessing _hurd_sigstate.
|
|
|
|
Exporting functions and relying on symbol interposition from libc.so
makes the choice of implementation dependent on DT_NEEDED order, which
is not what some compiler drivers expect.
This commit replaces one magic mechanism (symbol interposition) with
another one (preprocessor-/compiler-based redirection). This makes
the hand-over from the minimal malloc to the full malloc more
explicit.
Removing the ABI symbols is backwards-compatible because libc.so is
always in scope, and the dynamic loader will find the malloc-related
symbols there since commit f0b2132b35248c1f4a80f62a2c38cddcc802aa8c
("ld.so: Support moving versioned symbols between sonames
[BZ #24741]").
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
|
elf/dl-minimal.c provides a definition of free, so the function
pointer is always non-null, even before the final relocation
of the loader.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
|
The definitions are moved into a new file, elf/dl-sym-post.h, so that
this code can be used by the dynamic loader as well.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
|
This generalizes a mechanism used for stack-protector support, so
that it can be applied to other symbols if required.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
|
Remove extra argument from INTERNAL_SYSCALL_CALL macro call. Fixes
commit bc2eb9321e ("linux: Remove INTERNAL_SYSCALL_DECL").
|
|
With all Linux ABIs using the expected Linux kABI to indicate
syscalls errors, the INTERNAL_SYSCALL_DECL is an empty declaration
on all ports.
This patch removes the 'err' argument on INTERNAL_SYSCALL* macro
and remove the INTERNAL_SYSCALL_DECL usage.
Checked with a build against all affected ABIs.
|
|
|
|
With all Linux ABIs using the expected Linux kABI to indicate
syscalls errors, there is no need to replicate the INLINE_SYSCALL.
The generic Linux sysdep.h includes errno.h even for !__ASSEMBLER__,
which is ok now and it allows cleanup some archaic code that assume
otherwise.
Checked with a build against all affected ABIs.
|
|
The {INTERNAL,INLINE}_SYSCALL are defined only on s390 sysdep.h.
Checked on s390x-linux-gnu and s390-linux-gnu.
|
|
The riscv INTERNAL_SYSCALL macro might clobber the register
parameter if the argument itself might clobber any register (a function
call for instance).
This patch fixes it by using temporary variables for the expressions
between the register assignments (as indicated by GCC documentation,
6.47.5.2 Specifying Registers for Local Variables).
It is similar to the fix done for MIPS (bug 25523).
Checked with riscv64-linux-gnu-rv64imafdc-lp64d build.
|
|
The microblaze INTERNAL_SYSCALL macro might clobber the register
parameter if the argument itself might clobber any register (a function
call for instance).
This patch fixes it by using temporary variables for the expressions
between the register assignments (as indicated by GCC documentation,
6.47.5.2 Specifying Registers for Local Variables).
It is similar to the fix done for MIPS (bug 25523).
Checked with microblaze-linux-gnu and microblazeel-linux-gnu build.
|
|
It changes the nios INTERNAL_SYSCALL_RAW macro to return a negative
value instead of the 'r2' register value on the 'err' macro argument.
The macro INTERNAL_SYSCALL_DECL is no longer required, and the
INTERNAL_SYSCALL_ERROR_P macro follows the other Linux kABIs.
Checked with a build against nios2-linux-gnu.
|
|
It changes the mips INTERNAL_SYSCALL* and internal_syscall* macros
to return a negative value instead of the 'a3' register value on then
'err' macro argument.
The macro INTERNAL_SYSCALL_DECL is no longer required, and the
INTERNAL_SYSCALL_ERROR_P macro follows the other Linux kABIs.
The redefinition of INTERNAL_VSYSCALL_CALL is also no longer
required.
Checked on mips64-linux-gnu, mips64n32-linux-gnu, and mips-linux-gnu.
|
|
The mips64 Linux syscall macros only differs argument type and
the requirement of sign-extending values on n32. The headers
are consolidate by parameterizing the arguments with a new type,
__syscall_arg_t, and by defining the ARGIFY for n64.
Also, the generic unix mips64 sysdep is essentially the same,
only the load instruction need to be adjusted depending of the
ABI.
Checked on mips64-linux-gnu and mips64n32-linux-gnu.
|