riscv-gnu-toolchain/glibc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
9 days	Remove futex_supports_pshared	Andreas Schwab	1	-20/+0
	Both NPTL and HTL support PTHREAD_PROCESS_SHARED, and since the removal of the NaCL port there are no other pthread implementations.
2025-09-01	nptl: Fix "Arch-sepecific" typo in comment	Jonathan Wakely	1	-1/+1

2025-09-01	nptl: Drop IS_IN (libpthread) around hidden_proto (__pthread_rwlock_unlock)	Xi Ruoyao	1	-5/+1
	Now libpthread is a dummy library and it no longer contains __pthread_rwlock_unlock at all, thus IS_IN (libpthread) does not make sense here. It seems an left over from commit eb29dcde31e7 ("nptl: Move rwlock functions with forwarders into libc") and it caused libc.so to export an unversioned __pthread_rwlock_unlock on Linux ports introduced after the 2.34 release (loongarch and or1k) but the symbol is not ever supposed to be exported on those new ports. Only since the commit 3b2b88cceeb7 ("elf: early conversion of elf p_flags to mprotect flags") the header dependency change happened to pull in libc-lockP.h which sets hidden_proto (__pthread_rwlock_unlock) correctly, the symbol is no longer exported, breaking the ABI on those ports. Remove this #if as a clean up and to prevent such a mess from happening again. Signed-off-by: Xi Ruoyao <xry111@xry111.site> Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-04-02	stdlib: Fix qsort memory leak if callback throws (BZ 32058)	Adhemerval Zanella	1	-4/+4
	If the input buffer exceeds the stack auxiliary buffer, qsort will malloc a temporary one to call mergesort. Since C++ standard does allow the callback comparison function to throw [1], the glibc implementation can potentially leak memory. The fixes uses a pthread_cleanup_combined_push and pthread_cleanup_combined_pop, so it can work with and without exception enables. The qsort code path that calls malloc now requires some extra setup and a call to __pthread_cleanup_push anmd __pthread_cleanup_pop (which should be ok since they just setup some buffer state). Checked on x86_64-linux-gnu. [1] https://timsong-cpp.github.io/cppwp/n4950/alg.c.library#4 Reviewed-by: DJ Delorie <dj@redhat.com>
2025-03-25	aio_suspend64: Fix clock discrepancy [BZ #32795]	Samuel Thibault	1	-1/+1
	cc5d5852c65e ("y2038: Convert aio_suspend to support 64 bit time") switched from __clock_gettime (CLOCK_REALTIME, &now); to __clock_gettime64 (CLOCK_MONOTONIC, &ts);, but pthread_cond_timedwait is based on the absolute realtime clock, so migrate to using pthread_cond_clockwait to select CLOCK_MONOTONIC. Also fix AIO_MISC_WAIT into passing CLOCK_MONOTONIC to __futex_abstimed_wait64. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2025-03-13	nptl: PTHREAD_COND_INITIALIZER compatibility with pre-2.41 versions (bug 32786)	Florian Weimer	2	-1/+3
	The new initializer and struct layout does not initialize the __g_signals field in the old struct layout before the change in commit c36fc50781995e6758cae2b6927839d0157f213c ("nptl: Remove g_refs from condition variables"). Bring back fields at the end of struct __pthread_cond_s, so that they are again zero-initialized. Reviewed-by: Sam James <sam@gentoo.org>
2025-03-12	Linux: Add the pthread_gettid_np function (bug 27880)	Florian Weimer	1	-0/+5
	Current Bionic has this function, with enhanced error checking (the undefined case terminates the process). Reviewed-by: Joseph Myers <josmyers@redhat.com>
2025-02-21	nptl: clear the whole rseq area before registration	Michael Jeanson	1	-0/+1
	Due to the extensible nature of the rseq area we can't explictly initialize fields that are not part of the ABI yet. It was agreed with upstream that all new fields will be documented as zero initialized by userspace. Future kernels configured with CONFIG_DEBUG_RSEQ will validate the content of all fields during registration. Replace the explicit field initialization with a memset of the whole rseq area which will cover fields as they are added to future kernels. Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-01-30	nptl: Add support for setup guard pages with MADV_GUARD_INSTALL	Adhemerval Zanella	2	-2/+3
	Linux 6.13 (662df3e5c3766) added a lightweight way to define guard areas through madvise syscall. Instead of PROT_NONE the guard region through mprotect, userland can madvise the same area with a special flag, and the kernel ensures that accessing the area will trigger a SIGSEGV (as for PROT_NONE mapping). The madvise way has the advantage of less kernel memory consumption for the process page-table (one less VMA per guard area), and slightly less contention on kernel (also due to the fewer VMA areas being tracked). The pthread_create allocates a new thread stack in two ways: if a guard area is set (the default) it allocates the memory range required using PROT_NONE and then mprotect the usable stack area. Otherwise, if a guard page is not set it allocates the region with the required flags. For the MADV_GUARD_INSTALL support, the stack area region is allocated with required flags and then the guard region is installed. If the kernel does not support it, the usual way is used instead (and MADV_GUARD_INSTALL is disabled for future stack creations). The stack allocation strategy is recorded on the pthread struct, and it is used in case the guard region needs to be resized. To avoid needing an extra field, the 'user_stack' is repurposed and renamed to 'stack_mode'. This patch also adds a proper test for the pthread guard. I checked on x86_64, aarch64, powerpc64le, and hppa with kernel 6.13.0-rc7. Reviewed-by: DJ Delorie <dj@redhat.com>
2025-01-17	nptl: Remove g_refs from condition variables	Malte Skarupke	2	-3/+2
	This variable used to be needed to wait in group switching until all sleepers have confirmed that they have woken. This is no longer needed. Nothing waits on this variable so there is no need to track how many threads are currently asleep in each group. Signed-off-by: Malte Skarupke <malteskarupke@fastmail.fm> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2025-01-10	nptl: Move the rseq area to the 'extra TLS' block	Michael Jeanson	1	-9/+0
	Move the rseq area to the newly added 'extra TLS' block, this is the last step in adding support for the rseq extended ABI. The size of the rseq area is now dynamic and depends on the rseq features reported by the kernel through the elf auxiliary vector. This will allow applications to use rseq features past the 32 bytes of the original rseq ABI as they become available in future kernels. Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-01-10	nptl: Introduce <rseq-access.h> for RSEQ_* accessors	Michael Jeanson	1	-0/+56
	In preparation to move the rseq area to the 'extra TLS' block, we need accessors based on the thread pointer and the rseq offset. The ONCE variant of the accessors ensures single-copy atomicity for loads and stores which is required for all fields once the registration is active. A separate header is required to allow including <atomic.h> which results in an include loop when added to <tcb-access.h>. Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-01-10	nptl: Add rseq auxvals	Michael Jeanson	1	-4/+10
	Get the rseq feature size and alignment requirement from the auxiliary vector for use inside the dynamic loader. Use '__rseq_size' directly to store the feature size. If the main thread registration fails or is disabled by tunable, reset the value to 0. This will be used in the TLS block allocator to compute the size and alignment of the rseq area block for the extended ABI support. Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-01-09	Move <thread_pointer.h> to kernel-independent sysdeps directories	Florian Weimer	1	-28/+0
	Hurd is expected to use the same thread ABI as Linux. Reviewed-by: Michael Jeanson <mjeanson@efficios.com>
2025-01-01	Update copyright dates with scripts/update-copyrights	Paul Eggert	38	-38/+38

2024-12-31	elf: Do not change stack permission on dlopen/dlmopen	Adhemerval Zanella	1	-6/+0
	If some shared library loaded with dlopen/dlmopen requires an executable stack, either implicitly because of a missing GNU_STACK ELF header (where the ABI default flags implies in the executable bit) or explicitly because of the executable bit from GNU_STACK; the loader will try to set the both the main thread and all thread stacks (from the pthread cache) as executable. Besides the issue where any __nptl_change_stack_perm failure does not undo the previous executable transition (meaning that if the library fails to load, there can be thread stacks with executable stacks), this behavior was used on a CVE [1] as a vector for RCE. This patch changes that if a shared library requires an executable stack, and the current stack is not executable, dlopen fails. The change is done only for dynamically loaded modules, if the program or any dependency requires an executable stack, the loader will still change the main thread before program execution and any thread created with default stack configuration. [1] https://www.qualys.com/2023/07/19/cve-2023-38408/rce-openssh-forwarded-ssh-agent.txt Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-11-12	linux: Add support for getrandom vDSO	Adhemerval Zanella	2	-0/+14
	Linux 6.11 has getrandom() in vDSO. It operates on a thread-local opaque state allocated with mmap using flags specified by the vDSO. Multiple states are allocated at once, as many as fit into a page, and these are held in an array of available states to be doled out to each thread upon first use, and recycled when a thread terminates. As these states run low, more are allocated. To make this procedure async-signal-safe, a simple guard is used in the LSB of the opaque state address, falling back to the syscall if there's reentrancy contention. Also, _Fork() is handled by blocking signals on opaque state allocation (so _Fork() always sees a consistent state even if it interrupts a getrandom() call) and by iterating over the thread stack cache on reclaim_stack. Each opaque state will be in the free states list (grnd_alloc.states) or allocated to a running thread. The cancellation is handled by always using GRND_NONBLOCK flags while calling the vDSO, and falling back to the cancellable syscall if the kernel returns EAGAIN (would block). Since getrandom is not defined by POSIX and cancellation is supported as an extension, the cancellation is handled as 'may occur' instead of 'shall occur' [1], meaning that if vDSO does not block (the expected behavior) getrandom will not act as a cancellation entrypoint. It avoids a pthread_testcancel call on the fast path (different than 'shall occur' functions, like sem_wait()). It is currently enabled for x86_64, which is available in Linux 6.11, and aarch64, powerpc32, powerpc64, loongarch64, and s390x, which are available in Linux 6.12. Link: https://pubs.opengroup.org/onlinepubs/9799919799/nframe.html [1] Co-developed-by: Jason A. Donenfeld <Jason@zx2c4.com> Tested-by: Jason A. Donenfeld <Jason@zx2c4.com> # x86_64 Tested-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> # x86_64, aarch64 Tested-by: Xi Ruoyao <xry111@xry111.site> # x86_64, aarch64, loongarch64 Tested-by: Stefan Liebler <stli@linux.ibm.com> # s390x
2024-10-08	stdlib: Make abort/_Exit AS-safe (BZ 26275)	Adhemerval Zanella	2	-3/+6
	The recursive lock used on abort does not synchronize with a new process creation (either by fork-like interfaces or posix_spawn ones), nor it is reinitialized after fork(). Also, the SIGABRT unblock before raise() shows another race condition, where a fork or posix_spawn() call by another thread, just after the recursive lock release and before the SIGABRT signal, might create programs with a non-expected signal mask. With the default option (without POSIX_SPAWN_SETSIGDEF), the process can see SIG_DFL for SIGABRT, where it should be SIG_IGN. To fix the AS-safe, raise() does not change the process signal mask, and an AS-safe lock is used if a SIGABRT is installed or the process is blocked or ignored. With the signal mask change removal, there is no need to use a recursive loc. The lock is also taken on both _Fork() and posix_spawn(), to avoid the spawn process to see the abort handler as SIG_DFL. A read-write lock is used to avoid serialize _Fork and posix_spawn execution. Both sigaction (SIGABRT) and abort() requires to lock as writer (since both change the disposition). The fallback is also simplified: there is no need to use a loop of ABORT_INSTRUCTION after _exit() (if the syscall does not terminate the process, the system is broken). The proposed fix changes how setjmp works on a SIGABRT handler, where glibc does not save the signal mask. So usage like the below will now always abort. static volatile int chk_fail_ok; static jmp_buf chk_fail_buf; static void handler (int sig) { if (chk_fail_ok) { chk_fail_ok = 0; longjmp (chk_fail_buf, 1); } else _exit (127); } [...] signal (SIGABRT, handler); [....] chk_fail_ok = 1; if (! setjmp (chk_fail_buf)) { // Something that can calls abort, like a failed fortify function. chk_fail_ok = 0; printf ("FAIL\n"); } Such cases will need to use sigsetjmp instead. The _dl_start_profile calls sigaction through _profil, and to avoid pulling abort() on loader the call is replaced with __libc_sigaction. Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>
2024-09-28	Linux: Block signals around _Fork (bug 32215)	Florian Weimer	1	-0/+7
	This hides the inconsistent TCB state (missing robust mutex list) from signal handlers. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-08-23	nptl: Fix Race conditions in pthread cancellation [BZ#12683]	Adhemerval Zanella	3	-17/+68
	The current racy approach is to enable asynchronous cancellation before making the syscall and restore the previous cancellation type once the syscall returns, and check if cancellation has happen during the cancellation entrypoint. As described in BZ#12683, this approach shows 2 problems: 1. Cancellation can act after the syscall has returned from the kernel, but before userspace saves the return value. It might result in a resource leak if the syscall allocated a resource or a side effect (partial read/write), and there is no way to program handle it with cancellation handlers. 2. If a signal is handled while the thread is blocked at a cancellable syscall, the entire signal handler runs with asynchronous cancellation enabled. This can lead to issues if the signal handler call functions which are async-signal-safe but not async-cancel-safe. For the cancellation to work correctly, there are 5 points at which the cancellation signal could arrive: [ ... )[ ... )[ syscall ]( ... 1 2 3 4 5 1. Before initial testcancel, e.g. [... testcancel) 2. Between testcancel and syscall start, e.g. [testcancel...syscall start) 3. While syscall is blocked and no side effects have yet taken place, e.g. [ syscall ] 4. Same as 3 but with side-effects having occurred (e.g. a partial read or write). 5. After syscall end e.g. (syscall end...] And libc wants to act on cancellation in cases 1, 2, and 3 but not in cases 4 or 5. For the 4 and 5 cases, the cancellation will eventually happen in the next cancellable entrypoint without any further external event. The proposed solution for each case is: 1. Do a conditional branch based on whether the thread has received a cancellation request; 2. It can be caught by the signal handler determining that the saved program counter (from the ucontext_t) is in some address range beginning just before the "testcancel" and ending with the syscall instruction. 3. SIGCANCEL can be caught by the signal handler and determine that the saved program counter (from the ucontext_t) is in the address range beginning just before "testcancel" and ending with the first uninterruptable (via a signal) syscall instruction that enters the kernel. 4. In this case, except for certain syscalls that ALWAYS fail with EINTR even for non-interrupting signals, the kernel will reset the program counter to point at the syscall instruction during signal handling, so that the syscall is restarted when the signal handler returns. So, from the signal handler's standpoint, this looks the same as case 2, and thus it's taken care of. 5. For syscalls with side-effects, the kernel cannot restart the syscall; when it's interrupted by a signal, the kernel must cause the syscall to return with whatever partial result is obtained (e.g. partial read or write). 6. The saved program counter points just after the syscall instruction, so the signal handler won't act on cancellation. This is similar to 4. since the program counter is past the syscall instruction. So The proposed fixes are: 1. Remove the enable_asynccancel/disable_asynccancel function usage in cancellable syscall definition and instead make them call a common symbol that will check if cancellation is enabled (__syscall_cancel at nptl/cancellation.c), call the arch-specific cancellable entry-point (__syscall_cancel_arch), and cancel the thread when required. 2. Provide an arch-specific generic system call wrapper function that contains global markers. These markers will be used in SIGCANCEL signal handler to check if the interruption has been called in a valid syscall and if the syscalls has side-effects. A reference implementation sysdeps/unix/sysv/linux/syscall_cancel.c is provided. However, the markers may not be set on correct expected places depending on how INTERNAL_SYSCALL_NCS is implemented by the architecture. It is expected that all architectures add an arch-specific implementation. 3. Rewrite SIGCANCEL asynchronous handler to check for both canceling type and if current IP from signal handler falls between the global markers and act accordingly. 4. Adjust libc code to replace LIBC_CANCEL_ASYNC/LIBC_CANCEL_RESET to use the appropriate cancelable syscalls. 5. Adjust 'lowlevellock-futex.h' arch-specific implementations to provide cancelable futex calls. Some architectures require specific support on syscall handling: * On i386 the syscall cancel bridge needs to use the old int80 instruction because the optimized vDSO symbol the resulting PC value for an interrupted syscall points to an address outside the expected markers in __syscall_cancel_arch. It has been discussed in LKML [1] on how kernel could help userland to accomplish it, but afaik discussion has stalled. Also, sysenter should not be used directly by libc since its calling convention is set by the kernel depending of the underlying x86 chip (check kernel commit 30bfa7b3488bfb1bb75c9f50a5fcac1832970c60). * mips o32 is the only kABI that requires 7 argument syscall, and to avoid add a requirement on all architectures to support it, mips support is added with extra internal defines. Checked on aarch64-linux-gnu, arm-linux-gnueabihf, powerpc-linux-gnu, powerpc64-linux-gnu, powerpc64le-linux-gnu, i686-linux-gnu, and x86_64-linux-gnu. [1] https://lkml.org/lkml/2016/3/8/1105 Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2024-07-09	Linux: Make __rseq_size useful for feature detection (bug 31965)	Florian Weimer	1	-7/+1
	The __rseq_size value is now the active area of struct rseq (so 20 initially), not the full struct size including padding at the end (32 initially). Update misc/tst-rseq to print some additional diagnostics. Reviewed-by: Michael Jeanson <mjeanson@efficios.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2024-07-03	nptl: fix potential merge of __rseq_* relro symbols	Michael Jeanson	1	-8/+6
	While working on a patch to add support for the extensible rseq ABI, we came across an issue where a new 'const' variable would be merged with the existing '__rseq_size' variable. We tracked this to the use of '-fmerge-all-constants' which allows the compiler to merge identical constant variables. This means that all 'const' variables in a compile unit that are of the same size and are initialized to the same value can be merged. In this specific case, on 32 bit systems 'unsigned int' and 'ptrdiff_t' are both 4 bytes and initialized to 0 which should trigger the merge. However for reasons we haven't delved into when the attribute 'section (".data.rel.ro")' is added to the mix, only variables of the same exact types are merged. As far as we know this behavior is not specified anywhere and could change with a new compiler version, hence this patch. Move the definitions of these variables into an assembler file and add hidden writable aliases for internal use. This has the added bonus of removing the asm workaround to set the values on rseq registration. Tested on Debian 12 with GCC 12.2. Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-04-02	Always define __USE_TIME_BITS64 when 64 bit time_t is used	Adhemerval Zanella	1	-9/+9
	It was raised on libc-help [1] that some Linux kernel interfaces expect the libc to define __USE_TIME_BITS64 to indicate the time_t size for the kABI. Different than defined by the initial y2038 design document [2], the __USE_TIME_BITS64 is only defined for ABIs that support more than one time_t size (by defining the _TIME_BITS for each module). The 64 bit time_t redirects are now enabled using a different internal define (__USE_TIME64_REDIRECTS). There is no expected change in semantic or code generation. Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu, and arm-linux-gnueabi [1] https://sourceware.org/pipermail/libc-help/2024-January/006557.html [2] https://sourceware.org/glibc/wiki/Y2038ProofnessDesign Reviewed-by: DJ Delorie <dj@redhat.com>
2024-01-01	Update copyright dates with scripts/update-copyrights	Paul Eggert	37	-37/+37

2023-05-30	Fix misspellings in sysdeps/ -- BZ 25337	Paul Pluzhnikov	2	-4/+4

2023-05-01	hurd 64bit: Add missing libanl	Samuel Thibault	1	-0/+1
	The move of libanl to libc was in glibc 2.34 for nptl only.
2023-04-20	Created tunable to force small pages on stack allocation.	Cupertino Miranda	1	-0/+6
	Created tunable glibc.pthread.stack_hugetlb to control when hugepages can be used for stack allocation. In case THP are enabled and glibc.pthread.stack_hugetlb is set to 0, glibc will madvise the kernel not to use allow hugepages for stack allocations. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2023-03-29	Remove --enable-tunables configure option	Adhemerval Zanella Netto	4	-16/+0
	And make always supported. The configure option was added on glibc 2.25 and some features require it (such as hwcap mask, huge pages support, and lock elisition tuning). It also simplifies the build permutations. Changes from v1: * Remove glibc.rtld.dynamic_sort changes, it is orthogonal and needs more discussion. * Cleanup more code. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-01-06	Update copyright dates with scripts/update-copyrights	Joseph Myers	37	-37/+37

2022-09-26	Use atomic_exchange_release/acquire	Wilco Dijkstra	2	-3/+3
	Rename atomic_exchange_rel/acq to use atomic_exchange_release/acquire since these map to the standard C11 atomic builtins. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2022-09-23	Use C11 atomics instead of atomic_decrement_and_test	Wilco Dijkstra	1	-1/+1
	Replace atomic_decrement_and_test with atomic_fetch_add_relaxed. These are simple counters which do not protect any shared data from concurrent accesses. Also remove the unused file cond-perf.c. Passes regress on AArch64. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2022-09-20	m68k: Enforce 4-byte alignment on internal locks (BZ #29537)	Adhemerval Zanella	2	-2/+9
	A new internal definition, __LIBC_LOCK_ALIGNMENT, is used to force the 4-byte alignment only for m68k, other architecture keep the natural alignment of the type used internally (and hppa does not require 16-byte alignment for kernel-assisted CAS). Reviewed-by: Florian Weimer <fweimer@redhat.com>
2022-09-13	Use relaxed atomics since there is no MO dependence	Wilco Dijkstra	1	-7/+1
	Replace the 3 uses of atomic_bit_set and atomic_bit_test_set with atomic_fetch_or_relaxed. Using relaxed MO is correct since the atomics are used to ensure memory is released only once. Reviewed-by: Florian Weimer <fweimer@redhat.com>
2022-08-11	libio: Improve performance of IO locks	Wilco Dijkstra	1	-9/+19
	Improve performance of recursive IO locks by adding a fast path for the single-threaded case. To reduce the number of memory accesses for locking/unlocking, only increment the recursion counter if the lock is already taken. On Neoverse V1, a microbenchmark with many small freads improved by 2.9x. Multithreaded performance improved by 2%. Reviewed-by: Cristian Rodríguez <crrodriguez@opensuse.org>
2022-08-03	nptl: Remove uses of assert_perror	Florian Weimer	1	-8/+3
	__pthread_sigmask cannot actually fail with valid pointer arguments (it would need a really broken seccomp filter), and we do not check for errors elsewhere. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2022-07-27	arc4random: simplify design for better safety	Jason A. Donenfeld	1	-2/+0
	Rather than buffering 16 MiB of entropy in userspace (by way of chacha20), simply call getrandom() every time. This approach is doubtlessly slower, for now, but trying to prematurely optimize arc4random appears to be leading toward all sorts of nasty properties and gotchas. Instead, this patch takes a much more conservative approach. The interface is added as a basic loop wrapper around getrandom(), and then later, the kernel and libc together can work together on optimizing that. This prevents numerous issues in which userspace is unaware of when it really must throw away its buffer, since we avoid buffering all together. Future improvements may include userspace learning more from the kernel about when to do that, which might make these sorts of chacha20-based optimizations more possible. The current heuristic of 16 MiB is meaningless garbage that doesn't correspond to anything the kernel might know about. So for now, let's just do something conservative that we know is correct and won't lead to cryptographic issues for users of this function. This patch might be considered along the lines of, "optimization is the root of all evil," in that the much more complex implementation it replaces moves too fast without considering security implications, whereas the incremental approach done here is a much safer way of going about things. Once this lands, we can take our time in optimizing this properly using new interplay between the kernel and userspace. getrandom(0) is used, since that's the one that ensures the bytes returned are cryptographically secure. But on systems without it, we fallback to using /dev/urandom. This is unfortunate because it means opening a file descriptor, but there's not much of a choice. Secondly, as part of the fallback, in order to get more or less the same properties of getrandom(0), we poll on /dev/random, and if the poll succeeds at least once, then we assume the RNG is initialized. This is a rough approximation, as the ancient "non-blocking pool" initialized after the "blocking pool", not before, and it may not port back to all ancient kernels, though it does to all kernels supported by glibc (≥3.2), so generally it's the best approximation we can do. The motivation for including arc4random, in the first place, is to have source-level compatibility with existing code. That means this patch doesn't attempt to litigate the interface itself. It does, however, choose a conservative approach for implementing it. Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: Cristian Rodríguez <crrodriguez@opensuse.org> Cc: Paul Eggert <eggert@cs.ucla.edu> Cc: Mark Harris <mark.hsj@gmail.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: linux-crypto@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2022-07-22	stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417)	Adhemerval Zanella Netto	1	-0/+2
	The implementation is based on scalar Chacha20 with per-thread cache. It uses getrandom or /dev/urandom as fallback to get the initial entropy, and reseeds the internal state on every 16MB of consumed buffer. To improve performance and lower memory consumption the per-thread cache is allocated lazily on first arc4random functions call, and if the memory allocation fails getentropy or /dev/urandom is used as fallback. The cache is also cleared on thread exit iff it was initialized (so if arc4random is not called it is not touched). Although it is lock-free, arc4random is still not async-signal-safe (the per thread state is not updated atomically). The ChaCha20 implementation is based on RFC8439 [1], omitting the final XOR of the keystream with the plaintext because the plaintext is a stream of zeros. This strategy is similar to what OpenBSD arc4random does. The arc4random_uniform is based on previous work by Florian Weimer, where the algorithm is based on Jérémie Lumbroso paper Optimal Discrete Uniform Generation from Coin Flips, and Applications (2013) [2], who credits Donald E. Knuth and Andrew C. Yao, The complexity of nonuniform random number generation (1976), for solving the general case. The main advantage of this method is the that the unit of randomness is not the uniform random variable (uint32_t), but a random bit. It optimizes the internal buffer sampling by initially consuming a 32-bit random variable and then sampling byte per byte. Depending of the upper bound requested, it might lead to better CPU utilization. Checked on x86_64-linux-gnu, aarch64-linux, and powerpc64le-linux-gnu. Co-authored-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: Yann Droneaud <ydroneaud@opteya.com> [1] https://datatracker.ietf.org/doc/html/rfc8439 [2] https://arxiv.org/pdf/1304.1916.pdf
2022-07-05	Replace __libc_multiple_threads with __libc_single_threaded	Adhemerval Zanella	1	-1/+1
	And also fixes the SINGLE_THREAD_P macro for SINGLE_THREAD_BY_GLOBAL, since header inclusion single-thread.h is in the wrong order, the define needs to come before including sysdeps/unix/sysdep.h. The macro is now moved to a per-arch single-threade.h header. The SINGLE_THREAD_P is used on some more places. Checked on aarch64-linux-gnu and x86_64-linux-gnu.
2022-06-24	misc: Optimize internal usage of __libc_single_threaded	Adhemerval Zanella	1	-1/+1
	By adding an internal alias to avoid the GOT indirection. On some architecture, __libc_single_thread may be accessed through copy relocations and thus it requires to update also the copies default copy. This is done by adding a new internal macro, libc_hidden_data_{proto,def}, which has an addition argument that specifies the alias name (instead of default __GI_ one). Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Fangrui Song <maskray@google.com>
2022-05-09	nptl: Add backoff mechanism to spinlock loop	Wangyang Guo	2	-0/+36
	When mutiple threads waiting for lock at the same time, once lock owner releases the lock, waiters will see lock available and all try to lock, which may cause an expensive CAS storm. Binary exponential backoff with random jitter is introduced. As try-lock attempt increases, there is more likely that a larger number threads compete for adaptive mutex lock, so increase wait time in exponential. A random jitter is also added to avoid synchronous try-lock from other threads. v2: Remove read-check before try-lock for performance. v3: 1. Restore read-check since it works well in some platform. 2. Make backoff arch dependent, and enable it for x86_64. 3. Limit max backoff to reduce latency in large critical section. v4: Fix strict-prototypes error in sysdeps/nptl/pthread_mutex_backoff.h v5: Commit log updated for regression in large critical section. Result of pthread-mutex-locks bench Test Platform: Xeon 8280L (2 socket, 112 CPUs in total) First Row: thread number First Col: critical section length Values: backoff vs upstream, time based, low is better non-critical-length: 1 1 2 4 8 16 32 64 112 140 0 0.99 0.58 0.52 0.49 0.43 0.44 0.46 0.52 0.54 1 0.98 0.43 0.56 0.50 0.44 0.45 0.50 0.56 0.57 2 0.99 0.41 0.57 0.51 0.45 0.47 0.48 0.60 0.61 4 0.99 0.45 0.59 0.53 0.48 0.49 0.52 0.64 0.65 8 1.00 0.66 0.71 0.63 0.56 0.59 0.66 0.72 0.71 16 0.97 0.78 0.91 0.73 0.67 0.70 0.79 0.80 0.80 32 0.95 1.17 0.98 0.87 0.82 0.86 0.89 0.90 0.90 64 0.96 0.95 1.01 1.01 0.98 1.00 1.03 0.99 0.99 128 0.99 1.01 1.01 1.17 1.08 1.12 1.02 0.97 1.02 non-critical-length: 32 1 2 4 8 16 32 64 112 140 0 1.03 0.97 0.75 0.65 0.58 0.58 0.56 0.70 0.70 1 0.94 0.95 0.76 0.65 0.58 0.58 0.61 0.71 0.72 2 0.97 0.96 0.77 0.66 0.58 0.59 0.62 0.74 0.74 4 0.99 0.96 0.78 0.66 0.60 0.61 0.66 0.76 0.77 8 0.99 0.99 0.84 0.70 0.64 0.66 0.71 0.80 0.80 16 0.98 0.97 0.95 0.76 0.70 0.73 0.81 0.85 0.84 32 1.04 1.12 1.04 0.89 0.82 0.86 0.93 0.91 0.91 64 0.99 1.15 1.07 1.00 0.99 1.01 1.05 0.99 0.99 128 1.00 1.21 1.20 1.22 1.25 1.31 1.12 1.10 0.99 non-critical-length: 128 1 2 4 8 16 32 64 112 140 0 1.02 1.00 0.99 0.67 0.61 0.61 0.61 0.74 0.73 1 0.95 0.99 1.00 0.68 0.61 0.60 0.60 0.74 0.74 2 1.00 1.04 1.00 0.68 0.59 0.61 0.65 0.76 0.76 4 1.00 0.96 0.98 0.70 0.63 0.63 0.67 0.78 0.77 8 1.01 1.02 0.89 0.73 0.65 0.67 0.71 0.81 0.80 16 0.99 0.96 0.96 0.79 0.71 0.73 0.80 0.84 0.84 32 0.99 0.95 1.05 0.89 0.84 0.85 0.94 0.92 0.91 64 1.00 0.99 1.16 1.04 1.00 1.02 1.06 0.99 0.99 128 1.00 1.06 0.98 1.14 1.39 1.26 1.08 1.02 0.98 There is regression in large critical section. But adaptive mutex is aimed for "quick" locks. Small critical section is more common when users choose to use adaptive pthread_mutex. Signed-off-by: Wangyang Guo <wangyang.guo@intel.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-04-26	posix: Remove unused definition on _Fork	Adhemerval Zanella	1	-3/+0
	Checked on x86_64-linux-gnu.
2022-04-14	nptl: Handle spurious EINTR when thread cancellation is disabled (BZ#29029)	Adhemerval Zanella	2	-4/+1
	Some Linux interfaces never restart after being interrupted by a signal handler, regardless of the use of SA_RESTART [1]. It means that for pthread cancellation, if the target thread disables cancellation with pthread_setcancelstate and calls such interfaces (like poll or select), it should not see spurious EINTR failures due the internal SIGCANCEL. However recent changes made pthread_cancel to always sent the internal signal, regardless of the target thread cancellation status or type. To fix it, the previous semantic is restored, where the cancel signal is only sent if the target thread has cancelation enabled in asynchronous mode. The cancel state and cancel type is moved back to cancelhandling and atomic operation are used to synchronize between threads. The patch essentially revert the following commits: 8c1c0aae20 nptl: Move cancel type out of cancelhandling 2b51742531 nptl: Move cancel state out of cancelhandling 26cfbb7162 nptl: Remove CANCELING_BITMASK However I changed the atomic operation to follow the internal C11 semantic and removed the MACRO usage, it simplifies a bit the resulting code (and removes another usage of the old atomic macros). Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu, and powerpc64-linux-gnu. [1] https://man7.org/linux/man-pages/man7/signal.7.html Reviewed-by: Florian Weimer <fweimer@redhat.com> Tested-by: Aurelien Jarno <aurelien@aurel32.net>
2022-02-02	Linux: Use ptrdiff_t for __rseq_offset	Florian Weimer	1	-2/+2
	This matches the data size initial-exec relocations use on most targets. Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-01-01	Update copyright dates with scripts/update-copyrights	Paul Eggert	36	-36/+36
	I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 7061 files FOO. I then removed trailing white space from math/tgmath.h, support/tst-support-open-dev-null-range.c, and sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following obscure pre-commit check failure diagnostics from Savannah. I don't know why I run into these diagnostics whereas others evidently do not. remote: * 912-#endif remote: * 913: remote: * 914- remote: * error: lines with trailing whitespace found ... remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
2021-12-09	nptl: Add public rseq symbols and <sys/rseq.h>	Florian Weimer	1	-1/+22
	The relationship between the thread pointer and the rseq area is made explicit. The constant offset can be used by JIT compilers to optimize rseq access (e.g., for really fast sched_getcpu). Extensibility is provided through __rseq_size and __rseq_flags. (In the future, the kernel could request a different rseq size via the auxiliary vector.) Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-12-09	nptl: Add glibc.pthread.rseq tunable to control rseq registration	Florian Weimer	3	-1/+17
	This tunable allows applications to register the rseq area instead of glibc. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2021-12-09	nptl: Add rseq registration	Florian Weimer	1	-2/+6
	The rseq area is placed directly into struct pthread. rseq registration failure is not treated as an error, so it is possible that threads run with inconsistent registration status. <sys/rseq.h> is not yet installed as a public header. Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2021-12-09	nptl: Introduce THREAD_GETMEM_VOLATILE	Florian Weimer	1	-0/+2
	This will be needed for rseq TCB access. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-12-09	nptl: Introduce <tcb-access.h> for THREAD_* accessors	Florian Weimer	1	-0/+30
	These are common between most architectures. Only the x86 targets are outliers. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-12-09	nptl: Add <thread_pointer.h> for defining __thread_pointer	Florian Weimer	1	-0/+28
	<tls.h> already contains a definition that is quite similar, but it is not consistent across architectures. Only architectures for which rseq support is added are covered. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>