Age | Commit message (Collapse) | Author | Files | Lines |
|
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic atan2pif.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).
Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):
latency master patched improvement
x86_64 79.4006 70.8726 10.74%
x86_64v2 77.5136 69.1424 10.80%
x86_64v3 71.8050 68.1637 5.07%
aarch64 (Neoverse) 27.8363 24.7700 11.02%
power8 39.3893 17.2929 56.10%
power10 19.7200 16.8187 14.71%
reciprocal-throughput master patched improvement
x86_64 38.3457 30.9471 19.29%
x86_64v2 37.4023 30.3112 18.96%
x86_64v3 33.0713 24.4891 25.95%
aarch64 (Neoverse) 19.3683 15.3259 20.87%
power8 19.5507 8.27165 57.69%
power10 9.05331 7.63775 15.64%
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic asinpif.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).
Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):
latency master patched improvement
x86_64 46.4996 41.6126 10.51%
x86_64v2 46.7551 38.8235 16.96%
x86_64v3 42.6235 33.7603 20.79%
aarch64 (Neoverse) 17.4161 14.3604 17.55%
power8 10.7347 9.0193 15.98%
power10 10.6420 9.0362 15.09%
reciprocal-throughput master patched improvement
x86_64 24.7208 16.5544 33.03%
x86_64v2 24.2177 14.8938 38.50%
x86_64v3 20.5617 10.5452 48.71%
aarch64 (Neoverse) 13.4827 7.17613 46.78%
power8 6.46134 3.56089 44.89%
power10 5.79007 3.49544 39.63%
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic acospif.
The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).
Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):
latency master patched improvement
x86_64 54.8281 42.9070 21.74%
x86_64v2 54.1717 42.7497 21.08%
x86_64v3 49.3552 34.1512 30.81%
aarch64 (Neoverse) 17.9395 14.3733 19.88%
power8 20.3110 8.8609 56.37%
power10 11.3113 8.84067 21.84%
reciprocal-throughput master patched improvement
x86_64 21.2301 14.4803 31.79%
x86_64v2 20.6858 13.9506 32.56%
x86_64v3 16.1944 11.3377 29.99%
aarch64 (Neoverse) 11.4474 7.13282 37.69%
power8 10.6916 3.57547 66.56%
power10 4.64269 3.54145 23.72%
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
Like already done in various other places and advised by Roland in
https://lists.gnu.org/archive/html/bug-hurd/2012-04/msg00124.html
|
|
The RPC stub will write a string anyway.
|
|
since all symbol that use it are now in libc
Message-ID: <20250209200108.865599-9-gfleury@disroot.org>
|
|
Message-ID: <20250209200108.865599-8-gfleury@disroot.org>
|
|
Message-ID: <20250209200108.865599-7-gfleury@disroot.org>
|
|
Message-ID: <20250209200108.865599-6-gfleury@disroot.org>
|
|
into libc.
Message-ID: <20250209200108.865599-5-gfleury@disroot.org>
|
|
Message-ID: <20250209200108.865599-4-gfleury@disroot.org>
|
|
Message-ID: <20250209200108.865599-3-gfleury@disroot.org>
|
|
Message-ID: <20250209200108.865599-2-gfleury@disroot.org>
|
|
Code used during early static startup in elf/dl-tls.c uses
__mempcpy.
Fixes commit cbd9fd236981717d3d4ee942986ea912e9707c32 ("Consolidate
TLS block allocation for static binaries with ld.so").
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
The logic was copied wrong from CORE-MATH.
|
|
It's not necessary to introduce temporaries because the compiler
is able to evaluate l_soname just once in constracts like:
l_soname (l) != NULL && strcmp (l_soname (l), LIBC_SO) != 0
|
|
So that they can eventually be called separately from dlopen.
|
|
|
|
This reduces code size and dependencies on ld.so internals from
libc.so.
Fixes commit f4c142bb9fe6b02c0af8cfca8a920091e2dba44b
("arm: Use _dl_find_object on __gnu_Unwind_Find_exidx (BZ 31405)").
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
sysdeps/pthread/sem_open.c: call pthread_setcancelstate directely
since forward declaration is gone on hurd too
Message-ID: <20250201080202.494671-1-gfleury@disroot.org>
|
|
The logic was copied wrong from CORE-MATH.
|
|
It was copied wrong from CORE-MATH.
|
|
The tests uses ARCH_MIN_GUARD_SIZE and the sysdep.h include is not
required.
|
|
Decorate BSS mappings with [anon: glibc: .bss <file>], for example
[anon: glibc: .bss /lib/libc.so.6]. The string ".bss" is already used
by bionic so use the same, but add the filename as well. If the name
would be longer than what the kernel allows, drop the directory part
of the path.
Refactor glibc.mem.decorate_maps check to a separate function and use
it to avoid assembling a name, which would not be used later.
Signed-off-by: Petr Malat <oss@malat.biz>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Linux 6.13 (662df3e5c3766) added a lightweight way to define guard areas
through madvise syscall. Instead of PROT_NONE the guard region through
mprotect, userland can madvise the same area with a special flag, and
the kernel ensures that accessing the area will trigger a SIGSEGV (as for
PROT_NONE mapping).
The madvise way has the advantage of less kernel memory consumption for
the process page-table (one less VMA per guard area), and slightly less
contention on kernel (also due to the fewer VMA areas being tracked).
The pthread_create allocates a new thread stack in two ways: if a guard
area is set (the default) it allocates the memory range required using
PROT_NONE and then mprotect the usable stack area. Otherwise, if a
guard page is not set it allocates the region with the required flags.
For the MADV_GUARD_INSTALL support, the stack area region is allocated
with required flags and then the guard region is installed. If the
kernel does not support it, the usual way is used instead (and
MADV_GUARD_INSTALL is disabled for future stack creations).
The stack allocation strategy is recorded on the pthread struct, and it
is used in case the guard region needs to be resized. To avoid needing
an extra field, the 'user_stack' is repurposed and renamed to 'stack_mode'.
This patch also adds a proper test for the pthread guard.
I checked on x86_64, aarch64, powerpc64le, and hppa with kernel 6.13.0-rc7.
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
Message-ID: <20250103103750.870897-7-gfleury@disroot.org>
|
|
Message-ID: <20250103103750.870897-6-gfleury@disroot.org>
|
|
Message-ID: <20250103103750.870897-5-gfleury@disroot.org>
|
|
Message-ID: <20250103103750.870897-4-gfleury@disroot.org>
|
|
I haven't exposed _pthread_mutex_lock, _pthread_mutex_trylock and
_pthread_mutex_unlock in GLIBC_PRIVATE since there aren't used in any
code in libpthread
Message-ID: <20250103103750.870897-3-gfleury@disroot.org>
|
|
Message-ID: <20250103103750.870897-2-gfleury@disroot.org>
|
|
from b386295727d35a83aa3d4750e198cbf8040c9a23
|
|
Adding some basic tests for fopen, testing different modes, stream
positioning and concurrent read/write operation on files.
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
Include the space needed to store the length of the message itself, in
addition to the message string. This resolves BZ #32582.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Use upper 32 bits of HWCAP.
Reviewed-by: Andreas K. Huettel <dilfridge@gentoo.org>
|
|
Linux 6.13 was released with a change that overwrites those bytes.
This means that the check_unused subtest fails.
Update the manual accordingly.
Tested-by: Xi Ruoyao <xry111@xry111.site>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
As seen with GCC 11.5 on an AMD Ryzen 9 7950X CPU, with an
-fpmath=sse, --disable-multi-arch build of glibc.
|
|
- Add GCS marking to some of the tests when target supports GCS
- Fix tst-ro-dynamic-mod.map linker script to avoid removing
GNU properties
- Add header with macros for GNU properties
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Allocate GCS based on the stack size, this can be used for coroutines
(makecontext) and thread creation (if the kernel allows user allocated
GCS).
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
Unlike for BTI, the kernel does not process GCS properties so update
GL(dl_aarch64_gcs) before the GCS status is set.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
check_gcs is called for each dependency of a DSO, but the GNU property
of the ld.so is not processed so ldso->l_mach.gcs may not be correct.
Just assume ld.so is GCS compatible independently of the ELF marking.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
- Handle GCS marking
- Use l_searchlist.r_list for gcs (allows using the
same function for static exe)
Co-authored-by: Yury Khrustalev <yury.khrustalev@arm.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Allows using the same function for static exe.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
|
Use the dynamic linker start code to enable GCS in the dynamic linked
case after _dl_start returns and before _dl_start_user which marks
the point after which user code may run.
Like in the static linked case this ensures that GCS is enabled on a
top level stack frame.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Use the ARCH_SETUP_TLS hook to enable GCS in the static linked case.
The system call must be inlined and then GCS is enabled on a top
level stack frame that does not return and has no exception handlers
above it.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
This tunable controls Guarded Control Stack (GCS) for the process.
0 = disabled: do not enable GCS
1 = enforced: check markings and fail if any binary is not marked
2 = optional: check markings but keep GCS off if a binary is unmarked
3 = override: enable GCS, markings are ignored
By default it is 0, so GCS is disabled, value 1 will enable GCS.
The status is stored into GL(dl_aarch64_gcs) early and only applied
later, since enabling GCS is tricky: it must happen on a top level
stack frame. Using GL instead of GLRO because it may need updates
depending on loaded libraries that happen after readonly protection
is applied, however library marking based GCS setting is not yet
implemented.
Describe new tunable in the manual.
Co-authored-by: Yury Khrustalev <yury.khrustalev@arm.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Changed the makecontext logic: previously the first setcontext jumped
straight to the user callback function and the return address is set
to __startcontext. This does not work when GCS is enabled as the
integrity of the return address is protected, so instead the context
is setup such that setcontext jumps to __startcontext which calls the
user callback (passed in x20).
The map_shadow_stack syscall is used to allocate a suitably sized GCS
(which includes some reserved area to account for altstack signal
handlers and otherwise supports maximum number of 16 byte aligned
stack frames on the given stack) however the GCS is never freed as
the lifetime of ucontext and related stack is user managed.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|