Age | Commit message (Collapse) | Author | Files | Lines |
|
Enable SFrame stack track information. The --enable-sframe option
allows the glibc build to compile with SFrame stack track
information. Thus, enabling glibc's backtrace to work within glibc.
Signed-off-by: Claudiu Zissulescu <claudiu.zissulescu-ianculescu@oracle.com>
Reviewed-by: DJ Delorie <dj@redhat.com>
Reviewed-by: Sam James <sam@gentoo.org>
|
|
This patch adds the necessary bits to enable stack tracing using
SFrame. In the case the new SFrame stack tracing procedure doesn't
find SFrame related info, the stack tracing falls back on default
Dwarf implementation.
The new SFrame stack tracing procedure is added to debug/backtrace.c
file, the support functions are added in sysdeps folder, namely
sframe.h, read-sframe.c and read-sfame.h.
Signed-off-by: Claudiu Zissulescu <claudiu.zissulescu-ianculescu@oracle.com>
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
The SFrame is supported for AArch64 architecture.
Enable SFrame stack tracer for AArch64 too.
Signed-off-by: Claudiu Zissulescu <claudiu.zissulescu-ianculescu@oracle.com>
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
The SFrame is well supported by x86 architecture since binutils 2.41.
Enable it to be used as default frame tracer.
Signed-off-by: Claudiu Zissulescu <claudiu.zissulescu-ianculescu@oracle.com>
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
The SFrame provides information to be able to do stack trace is now
well defined and implemented in Binutils 2.41. The format simply
contains enough information to be able to do stack trace given a
program counter (PC) value, the stack pointer, and the frame pointer.
The SFrame information is stored in a .sframe ELF section, which is
loaded into its own PT_GNU_SFRAME segment. We consider for this support
SFrame version 2.
This patch adds the bits to _dl_find_object to recognize and store in
struct dl_find_object the necessary info about SFrame section.
Signed-off-by: Claudiu Zissulescu <claudiu.zissulescu-ianculescu@oracle.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
|
|
The SSE4.1 provides a direct instruction for trunc, which improves
modf/modff performance with a less text size. On Ryzen 9 (zen3) with
gcc 14.2.1:
x86_64-v2
reciprocal-throughput master patch difference
workload-0_1 7.9610 7.7914 2.13%
workload-1_maxint 9.4323 7.8021 17.28%
workload-maxint_maxfloat 8.7379 7.8049 10.68%
workload-integral 7.9492 7.7991 1.89%
latency master patch difference
workload-0_1 7.9511 10.8910 -36.97%
workload-1_maxint 15.8278 10.9048 31.10%
workload-maxint_maxfloat 11.3495 10.9139 3.84%
workload-integral 11.5938 10.9071 5.92%
x86_64-v3
reciprocal-throughput master patch difference
workload-0_1 8.7522 7.9781 8.84%
workload-1_maxint 9.6690 7.9872 17.39%
workload-maxint_maxfloat 8.7634 7.9857 8.87%
workload-integral 8.7397 7.9893 8.59%
latency master patch difference
workload-0_1 8.7447 9.5589 -9.31%
workload-1_maxint 13.7480 9.5690 30.40%
workload-maxint_maxfloat 10.0092 9.5680 4.41%
workload-integral 9.7518 9.5743 1.82%
For x86_64-v1 the optimization is done through a new ifunc selector.
The avx is to follow other SSE4_1 optimization (like trunc) to avoid
the ifunc for x86_64-v3.
Checked on x86_64-linux-gnu.
Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
|
Undefine TCGETS, TCGETS2, and related ioctl constants in the installed
headers. Extract the correct constants (using the kernel type
definitions) automatically from the UAPI headers. The kernel
constants are available under KERNEL_* names during the glibc build,
computed using assembler constant extraction mechanism.
Alpha may have to use TCGETS instead of TCGETS2 because TCTGETS2
became available in Linux 4.20 only. Introduce ARCH_TCGETS to make
this choice explict.
To support emulation on powerpc, glibc versions of the termios
constants are added to the emulation code in internal-ioctl.h.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The definition may depend on termios internals.
|
|
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
The use of the termios2 ioctl interface is an implementation detail which
should not bleed into public headers. Remove the PowerPC version of
<bits/ioctls.h> and define the termios2 ioctl numbers in <termios_arch.h>
instead. Also remove the include check from there which is unneeded in an
internal header.
|
|
Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
This is required so the generated ld.so.conf files take effect.
Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
This way, a nonstandard directory within the testroot containing
libgcc_s.so can actually be picked up and used during the test runs.
Also provide a subdirectory ld.so.conf.d for drop-in configuration
Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
* This needs to be done twice, for test runs with and without
--enable-hardcoded-path-in-tests
* Also, we need to query the used $(CC) for the library location.
* The container tests run ldd and dump the list of needed libraries, then
copy these into the container.
* Without this patch, ldd may not find libgcc_s.so, resulting in"not found"
output and no copying of the library.
* With this patch, the library is picked up independent of its location (as
long as the proper directory is provided) and copied into the testroot.
* This does not mean yet that ld.so in the testroot actually finds it.
Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
As discussed here:
https://sourceware.org/pipermail/libc-alpha/2025-July/168492.html
The support for TX lock elision of pthread mutexes is deprecated on
all architectures and will be removed in the next release.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Since mcount_internal is called from mcount/__fentry__ which preserve
only RAX, RCX, RDX, RSI, RDI, R8 and R9, compile mcount.c with
-fno-tree-loop-distribute-patterns -mgeneral-regs-only -mno-apxf
to void vector/r16-r31 registers and memcpy/memset in mcount_internal.
This fixes BZ #33134.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Andreas K. Huettel <dilfridge@gentoo.org>
|
|
Update NEWS with tcache improvements.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
The fstatat behaviour when the target is a dangling symlink is different
if flags contains AT_SYMLINK_NOFOLLOW or not.
Add a test for this and document it.
|
|
Document the fstatat behaviour leading to a ENOENT errno, and extend
tests to test the case where filename does not exist.
Signed-off-by: Matteo Croce <teknoraver@meta.com>
|
|
This fixes the cleanup call from __qsort_r
|
|
It is unused since ccdb68e829a3 ("htl: move pthread_once into libc")
|
|
The changes in commit a93d9e03a31ec14405cb3a09aa95413b67067380
("Extend struct r_debug to support multiple namespaces [BZ #15971]")
break the dyninst dynamic instrumentation tool. It brings its
own definition of _r_debug (rather than a declaration).
Furthermore, it turns out it is rather hard to use the proposed
handshake for accessing _r_debug via DT_DEBUG. If applications want
to access _r_debug, they can do so directly if the relevant code has
been built as PIC. To protect against harm from accidental copy
relocations due to linker relaxations, this commit restores copy
relocation support by adjusting both copies if interposition or
copy relocations are in play. Therefore, it is possible to
use a hidden reference in ld.so to access _r_debug.
Only perform the copy relocation initialization if libc has been
loaded. Otherwise, the ld.so search scope can be empty, and the
lookup of the _r_debug symbol mail fail.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
It combines updating r_state with the debugger notification.
The second change to _dl_open introduces an additional debugger
notification for dlmopen, but debuggers are expected to ignore it.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
It replaces the ns_debug member of the namespaces. Previously,
the base namespace had an unused ns_debug member.
This change also fixes a concurrency issue: Now _dl_debug_initialize
only updates r_next of the previous namespace's r_debug after the new
r_debug is initialized, so that only the initialized version is
observed. (Client code accessing _r_debug will benefit from load
dependency tracking in CPUs even without explicit barriers.)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
Texinfo 7.2 began warning about the '.info' suffix in the manual names
passed to @ref and similar commands. They eventually plan to stop
stripping the '.info' suffix internally which will lead to broken links
in the manuals without this change.
Signed-off-by: Collin Funk <collin.funk1@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
|
|
tst-qsort5 was deleted in 709fbd3ec3595f2d1076b4fec09a739327459288.
Therefore remove its redundant libm dependency.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
|
|
Add DL_ADDRESS_WITHOUT_RELOC to force an address into a general purpose
register to prevent loading it into a vector register directly before
run-time relocation. This is an updated fix for BZ #33088.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
|
|
The iovec size should account for all substrings between each conversion
specification. For the format:
"abc %s efg"
The list of substrings are:
["abc ", arg, " efg]
which is 2 times the number of maximum arguments *plus* one.
This issue triggered 'out of bounds' errors by stdlib/tst-bz20544 when
glibc is built with experimental UBSAN support [1].
Besides adjusting the iovec size, a new runtime and check is added to
avoid wrong __libc_message_impl usage.
Checked on x86_64-linux-gnu.
[1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/ubsan-undef
Co-authored-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
|
During early startup memcpy or memset must not be called since many targets
use ifuncs for them which won't be initialized yet. Security hardening may
use -ftrivial-auto-var-init=zero which inserts calls to memset. Redirect
memset to memset_generic by including dl-symbol-redir-ifunc.h in cpu-features.c.
This fixes BZ #33112.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Cleanup tcache_init() by using the new __libc_malloc2 interface.
Reviewed-by: Cupertino Miranda <cupertino.miranda@oracle.com>
|
|
Replaced all instances of __builtin_expect to __glibc_unlikely
within malloc.c and malloc-debug.c. This improves the portability
of glibc by avoiding calls to GNU C built-in functions. Since all
the expected results from calls to __builtin_expect were 0,
__glibc_likely was never used as a replacement. Multiple
calls to __builtin_expect within a single if statement have
been replaced with one call to __glibc_unlikely, which wraps
every condition.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
Renamed aligned_OK to misaligned_mem as to be similar
to misaligned_chunk, and reversed any assertions using
the macro. Made misaligned_chunk call misaligned_mem after
chunk2mem rather than bitmasking with the malloc alignment
itself, since misaligned_chunk is meant to test the data
chunk itself rather than the header, and the compiler
will optimise the addition so the ternary operator is not
needed.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
Clarify the meaning of renameat arguments.
Reviewed-by: Collin Funk <collin.funk1@gmail.com>
|
|
This reverts commit abc2e954af77f8d10f4f54754520814590e79830.
Reason for revert: Wrong version of the patch.
|
|
Fixes commit c1560f3f75c0e892b5522c16f91b4e303f677094
("elf: Switch to main malloc after final ld.so self-relocation").
Reviewed-by: Frédéric Bérat <fberat@redhat.com>
|
|
The generic implementation is slight more optimized than the powerpc
one, where it has a more optimized inf/nan check (by not using FP
unit checks, along with branch prediction hints), and removed one
branch by issuing trunc instead of a combination of floor/ceil (which
also generated less code).
On power10 with gcc 14.2.1:
reciprocal-throughput master patch difference
workload-0_1 1.1351 0.9067 20.12%
workload-1_maxint 1.4230 0.9040 36.47%
workload-maxint_maxfloat 1.5038 0.9076 39.65%
workload-integral 1.1280 0.9111 19.23%
latency master patch difference
workload-0_1 1.1440 2.7117 -137.03%
workload-1_maxint 4.0556 2.7070 33.25%
workload-maxint_maxfloat 3.2122 2.7164 15.43%
workload-integral 3.2381 2.7281 15.75%
Checked on powerpc64le-linux-gnu.
Reviewed-by: Sachin Monga <smonga@linux.ibm.com>
|
|
The generic implementation is slight more optimized than the powerpc
one, where it has a more optimized inf/nan check (by not using FP
unit checks, along with branch prediction hints), and removed one
branch by issuing trunc instead of a combination of floor/ceil (which
also generated less code).
On power10 with gcc 14.2.1:
reciprocal-throughput master patch difference
workload-0_1 1.5210 1.3942 8.34%
workload-1_maxint 2.0926 1.3940 33.38%
workload-maxint_maxfloat 1.7851 1.3940 21.91%
workload-integral 1.5216 1.3941 8.37%
latency master patch difference
workload-0_1 1.5928 2.6337 -65.35%
workload-1_maxint 3.2929 2.6337 20.02%
workload-maxint_maxfloat 1.9697 2.6341 -33.73%
workload-integral 2.0597 2.6337 -27.87%
Checked on powerpc64le-linux-gnu.
Reviewed-by: Sachin Monga <smonga@linux.ibm.com>
|
|
Reviewed-by: Collin Funk <collin.funk1@gmail.com>
|
|
Make '__close_nocancel_nostatus' standalone. This is a generic version
analogous to '__close_nocancel'. Platforms may choose to implement an
inline variant instead where the syscall invocation code sequence is
short enough to be beneficial over a function call.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
|
|
Fix fallout from commit c181840c93d3 ("Consolidate non cancellable close
call") that caused '__close_nocancel_nostatus' to clobber 'errno' on a
close(2) failure, a 2.27 regression.
The problem came from a rewrite from 'close_not_cancel_no_status' to
'__close_nocancel_nostatus' switching from an inline implementation that
used INTERNAL_SYSCALL macro (which stays away from 'errno') to a call to
'__close_nocancel' function that uses INLINE_SYSCALL_CALL macro (which
does poke at 'errno').
Implement '__close_nocancel_nostatus' in terms of INTERNAL_SYSCALL_CALL
then, which leaves 'errno' intact.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
|
|
Checked on aarch64-linux-gnu and x86_64-linux-gnu.
Reviewed-by: Collin Funk <collin.funk1@gmail.com>
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
The benchtests/inet_ntop_ipv4 and benchtests/inet_ntop_ipv6 profile
shows that most of time is spent in costly sprint operations:
$ perf record ./benchtests/bench-inet_ntop_ipv4 && perf report --stdio
[...]
38.53% bench-inet_ntop libc.so [.] __printf_buffer
18.69% bench-inet_ntop libc.so [.] __printf_buffer_write
11.01% bench-inet_ntop libc.so [.] _itoa_word
8.02% bench-inet_ntop bench-inet_ntop_ipv4 [.] bench_start
6.99% bench-inet_ntop libc.so [.] __memmove_avx_unaligned_erms
3.86% bench-inet_ntop libc.so [.] __strchrnul_avx2
2.82% bench-inet_ntop libc.so [.] __strcpy_avx2
1.90% bench-inet_ntop libc.so [.] inet_ntop4
1.78% bench-inet_ntop libc.so [.] __vsprintf_internal
1.55% bench-inet_ntop libc.so [.] __sprintf_chk
1.18% bench-inet_ntop libc.so [.] __GI___inet_ntop
$ perf record ./benchtests/bench-inet_ntop_ipv6 && perf report --stdio
35.44% bench-inet_ntop libc.so [.] __printf_buffer
14.35% bench-inet_ntop libc.so [.] __printf_buffer_write
10.27% bench-inet_ntop libc.so [.] __GI___inet_ntop
7.93% bench-inet_ntop libc.so [.] _itoa_word
7.00% bench-inet_ntop libc.so [.] __sprintf_chk
6.20% bench-inet_ntop libc.so [.] __vsprintf_internal
5.26% bench-inet_ntop libc.so [.] __strchrnul_avx2
5.05% bench-inet_ntop bench-inet_ntop_ipv6 [.] bench_start
3.70% bench-inet_ntop libc.so [.] __memmove_avx_unaligned_erms
2.11% bench-inet_ntop libc.so [.] __printf_buffer_done
A new implementation is used instead:
* The printf usage is replaced with an expanded function that prints
either an IPv4 octet or an IPv6 quartet;
* The strcpy is replaced with a memcpy (since ABIs usually tends to
optimize the latter);
* For IPv6, the '::' shorthanding is done in-place instead of using
a temporary buffer.
* An temporary buffer is used iff the size if larger than
INET_ADDRSTRLEN/INET6_ADDRSTRLEN.
* Inline is used for both inet_ntop4 and inet_ntop6,
The code is significand rewrote, so I take this requires a new license.
The performance results on aarch64 Neoverse1 with gcc 14.2.1:
* master
aarch64-linux-gnu-master$ ./benchtests/bench-inet_ntop_ipv4
"inet_ntop_ipv4": {
"workload-ipv4-random": {
"duration": 1.43067e+09,
"iterations": 8e+06,
"reciprocal-throughput": 178.572,
"latency": 179.096,
"max-throughput": 5.59997e+06,
"min-throughput": 5.58359e+06
}
aarch64-linux-gnu-master$ ./benchtests/bench-inet_ntop_ipv6
"inet_ntop_ipv6": {
"workload-ipv6-random": {
"duration": 1.68539e+09,
"iterations": 4e+06,
"reciprocal-throughput": 421.307,
"latency": 421.388,
"max-throughput": 2.37357e+06,
"min-throughput": 2.37311e+06
}
}
* patched
aarch64-linux-gnu$ ./benchtests/bench-inet_ntop_ipv4
"inet_ntop_ipv4": {
"workload-ipv4-random": {
"duration": 1.06133e+09,
"iterations": 5.6e+07,
"reciprocal-throughput": 18.8482,
"latency": 19.0565,
"max-throughput": 5.30555e+07,
"min-throughput": 5.24755e+07
}
}
aarch64-linux-gnu$ ./benchtests/bench-inet_ntop_ipv6
"inet_ntop_ipv6": {
"workload-ipv6-random": {
"duration": 1.01246e+09,
"iterations": 2.4e+07,
"reciprocal-throughput": 42.5576,
"latency": 41.8139,
"max-throughput": 2.34976e+07,
"min-throughput": 2.39155e+07
}
}
Checked on aarch64-linux-gnu and x86_64-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
Random IP addresses in the full range. There is no extra workload
to check the effectiveness '::' optimization for a set of 0-oct
sets (although it would be a possible workload).
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
Random IP addresses in the full range.
Reviewed-by: Collin Funk <collin.funk1@gmail.com>
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
Linux kernel >= 6.16 has getrandom() in vDSO for RISC-V. Enable the use
of it in Glibc so it would benefit the programs using the Glibc high
quality random number functions.
Link: https://git.kernel.org/torvalds/c/ee0d03053e70
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
The master branch started to enable some warnings due to optimization
that were only triggered with -Os [1]. Enable the suppression regardless
of optimization level.
Checked on aarch64-linux-gnu build.
[1] https://gcc.gnu.org/pipermail/gcc-regression/2025-June/082378.html
Reviewed-by: Sam James <sam@gentoo.org>
Reviewed-by: Collin Funk <collin.funk1@gmail.com>
|