aboutsummaryrefslogtreecommitdiff
path: root/sysdeps
AgeCommit message (Collapse)AuthorFilesLines
2022-05-17rtld: Remove DL_ARGV_NOT_RELRO and make _dl_skip_args constSzabolcs Nagy11-145/+3
_dl_skip_args is always 0, so the target specific code that modifies argv after relro protection is applied is no longer used. After the patch relro protection is applied to _dl_argv consistently on all targets. Reviewed-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2022-05-17rtld: Use generic argv adjustment in ld.so [BZ #23293]Szabolcs Nagy1-17/+13
When an executable is invoked as ./ld.so [ld.so-args] ./exe [exe-args] then the argv is adujusted in ld.so before calling the entry point of the executable so ld.so args are not visible to it. On most targets this requires moving argv, env and auxv on the stack to ensure correct stack alignment at the entry point. This had several issues: - The code for this adjustment on the stack is written in asm as part of the target specific ld.so _start code which is hard to maintain. - The adjustment is done after _dl_start returns, where it's too late to update GLRO(dl_auxv), as it is already readonly, so it points to memory that was clobbered by the adjustment. This is bug 23293. - _environ is also wrong in ld.so after the adjustment, but it is likely not used after _dl_start returns so this is not user visible. - _dl_argv was updated, but for this it was moved out of relro, which changes security properties across targets unnecessarily. This patch introduces a generic _dl_start_args_adjust function that handles the argument adjustments after ld.so processed its own args and before relro protection is applied. The same algorithm is used on all targets, _dl_skip_args is now 0, so existing target specific adjustment code is no longer used. The bug affects aarch64, alpha, arc, arm, csky, ia64, nios2, s390-32 and sparc, other targets don't need the change in principle, only for consistency. The GNU Hurd start code relied on _dl_skip_args after dl_main returned, now it checks directly if args were adjusted and fixes the Hurd startup data accordingly. Follow up patches can remove _dl_skip_args and DL_ARGV_NOT_RELRO. Tested on aarch64-linux-gnu and cross tested on i686-gnu. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2022-05-16Remove dl-librecon.h header.Adhemerval Zanella4-87/+0
The Linux version used by i686 and m68k provide three overrrides for generic code: 1. DISTINGUISH_LIB_VERSIONS to print additional information when libc5 is used by a dependency. 2. EXTRA_LD_ENVVARS to that enabled LD_LIBRARY_VERSION environment variable. 3. EXTRA_UNSECURE_ENVVARS to add two environment variables related to aout support. None are really requires, it has some decades since libc5 or aout suppported was removed and Linux even remove support for aout files. The LD_LIBRARY_VERSION is also dead code, dl_correct_cache_id is not used anywhere. Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>
2022-05-16elf: Remove ldconfig kernel version checkAdhemerval Zanella10-91/+69
Now that it was removed on libc.so.
2022-05-16Remove kernel version checkAdhemerval Zanella8-192/+7
The kernel version check is used to avoid glibc to run on older kernels where some syscall are not available and fallback code are not enabled to handle graciously fail. However, it does not prevent if the kernel does not correctly advertise its version through vDSO note, uname or procfs. Also kernel version checks are sometime not desirable by users, where they want to deploy on different system with different kernel version knowing the minimum set of syscall is always presented on such systems. The kernel version check has been removed along with the LD_ASSUME_KERNEL environment variable. The minimum kernel used to built glibc is still provided through NT_GNU_ABI_TAG ELF note and also printed when libc.so is issued. Checked on x86_64-linux-gnu.
2022-05-16linux: Use /sys/devices/system/cpu on __get_nprocs_conf (BZ#28991)Adhemerval Zanella1-32/+4
Currently on Linux __get_nprocs_conf first tries to enumerate the cpus present in the system by iterating on /sys/devices/system/cpuX directories. This only enumerates the CPUs that are present in system (but possibly offline), not taking in account possible CPU that might added in the system through hotplugging. Linux provides the maximum number of configured cpus on the /sys/devices/system/cpu file. Although it might present a larger value of possible active CPUs on some system (where kernel either get the information from firmaware or is configured at boot time), the information is what kernel presents to userland. This also change the returned value of _SC_NPROCESSORS_CONF, which aligns as the maximum configure cpu in the system. Checked on x86_64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>
2022-05-16csu: Implement and use _dl_early_allocate during static startupFlorian Weimer2-0/+87
This implements mmap fallback for a brk failure during TLS allocation. scripts/tls-elf-edit.py is updated to support the new patching method. The script no longer requires that in the input object is of ET_DYN type. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2022-05-16Linux: Introduce __brk_call for invoking the brk system callFlorian Weimer5-78/+71
Alpha and sparc can now use the generic implementation. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2022-05-16x86_64: Remove bzero optimizationAdhemerval Zanella11-235/+2
Both symbols are marked as legacy in POSIX.1-2001 and removed on POSIX.1-2008, although the prototypes are defined for _GNU_SOURCE or _DEFAULT_SOURCE. GCC also replaces bcopy with a memmove and bzero with memset on default configuration (to actually get a bzero libc call the code requires to omit string.h inclusion and built with -fno-builtin), so it is highly unlikely programs are actually calling libc bzero symbol. On a recent Linux distro (Ubuntu 22.04), there is no bzero calls by the installed binaries. $ cat count_bstring.sh #!/bin/bash files=`IFS=':';for i in $PATH; do test -d "$i" && find "$i" -maxdepth 1 -executable -type f; done` total=0 for file in $files; do symbols=`objdump -R $file 2>&1` if [ $? -eq 0 ]; then ncalls=`echo $symbols | grep -w $1 | wc -l` ((total=total+ncalls)) if [ $ncalls -gt 0 ]; then echo "$file: $ncalls" fi fi done echo "TOTAL=$total" $ ./count_bstring.sh bzero TOTAL=0 Checked on x86_64-linux-gnu.
2022-05-13RISC-V: Use an autoconf template to produce `preconfigure'Maciej W. Rozycki2-15/+82
Avoid fiddling with autoconf internals and use AC_DEFINE_UNQUOTED to define macros in the configuration headers rather than handcoding an equivalent shell sequence with the use of the `as_echo' undocumented variable. Switch to using AC_MSG_ERROR rather than `echo' and `exit' directly for error handling. Owing to the lack of any kind of error annotation it makes it difficult to spot the message in the flood in a parallel build and neither it is logged in `config.log'. Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-05-13MIPS: Use an autoconf template to produce `preconfigure'Maciej W. Rozycki2-2/+39
Avoid fiddling with autoconf internals and use AC_DEFINE_UNQUOTED to define macros in the configuration headers rather than handcoding an equivalent shell sequence with the use of the `as_echo' undocumented variable. Similarly use AC_MSG_ERROR for error handling rather than the internal undocumented `as_fn_error' variable. Switch to using 1 as the exit code as it makes no sense to refer $? in the contexts involved, it's not a command failure handled there.
2022-05-13m68k: Use an autoconf template to produce `preconfigure'Maciej W. Rozycki2-2/+27
Switch to using AC_MSG_ERROR rather than `echo' and `exit' directly for error handling. Owing to the lack of any kind of error annotation it makes it difficult to spot the message in the flood in a parallel build and neither it is logged in `config.log'.
2022-05-13C-SKY: Use an autoconf template to produce `preconfigure'Maciej W. Rozycki2-10/+72
Avoid fiddling with autoconf internals and use AC_DEFINE_UNQUOTED to define macros in the configuration headers rather than handcoding an equivalent shell sequence with the use of the `as_echo' undocumented variable. Switch to using AC_MSG_ERROR rather than `echo' and `exit' directly for error handling. Owing to the lack of any kind of error annotation it makes it difficult to spot the message in the flood in a parallel build and neither it is logged in `config.log'.
2022-05-13stdio: Remove the usage of $(fno-unit-at-a-time) for siglist.cAdhemerval Zanella4-17/+25
The siglist.c is built with -fno-toplevel-reorder to avoid compiler to reorder the compat assembly directives due an assembler issue [1] (fixed on 2.39). This patch removes the compiler flags by split the compat symbol generation in two phases. First the __sys_siglist and __sys_sigabbrev without any compat symbol directive is preprocessed to generate an assembly source code. This generate assembly is then used as input on a platform agnostic siglist.S which then creates the compat definitions. This prevents compiler to move any compat directive prior the _sys_errlist definition itself. Checked on a make check run-built-tests=no on all affected ABIs. Reviewed-by: Fangrui Song <maskray@google.com>
2022-05-13stdio: Remove the usage of $(fno-unit-at-a-time) for errlist.cAdhemerval Zanella7-14/+28
The errlist.c is built with -fno-toplevel-reorder to avoid compiler to reorder the compat assembly directives due an assembler issue [1] (fixed on 2.39). This patch removes the compiler flags by split the compat symbol generation in two phases. First the _sys_errlist_internal internal without any compat symbol directive is preprocessed to generate an assembly source code. This generate assembly is then used as input on a platform agnostic errlist-data.S which then creates the compat definitions. This prevents compiler to move any compat directive prior the _sys_errlist_internal definition itself. Checked on a make check run-built-tests=no on all affected ABIs. [1] https://sourceware.org/bugzilla/show_bug.cgi?id=29012
2022-05-09nptl: Add backoff mechanism to spinlock loopWangyang Guo3-0/+75
When mutiple threads waiting for lock at the same time, once lock owner releases the lock, waiters will see lock available and all try to lock, which may cause an expensive CAS storm. Binary exponential backoff with random jitter is introduced. As try-lock attempt increases, there is more likely that a larger number threads compete for adaptive mutex lock, so increase wait time in exponential. A random jitter is also added to avoid synchronous try-lock from other threads. v2: Remove read-check before try-lock for performance. v3: 1. Restore read-check since it works well in some platform. 2. Make backoff arch dependent, and enable it for x86_64. 3. Limit max backoff to reduce latency in large critical section. v4: Fix strict-prototypes error in sysdeps/nptl/pthread_mutex_backoff.h v5: Commit log updated for regression in large critical section. Result of pthread-mutex-locks bench Test Platform: Xeon 8280L (2 socket, 112 CPUs in total) First Row: thread number First Col: critical section length Values: backoff vs upstream, time based, low is better non-critical-length: 1 1 2 4 8 16 32 64 112 140 0 0.99 0.58 0.52 0.49 0.43 0.44 0.46 0.52 0.54 1 0.98 0.43 0.56 0.50 0.44 0.45 0.50 0.56 0.57 2 0.99 0.41 0.57 0.51 0.45 0.47 0.48 0.60 0.61 4 0.99 0.45 0.59 0.53 0.48 0.49 0.52 0.64 0.65 8 1.00 0.66 0.71 0.63 0.56 0.59 0.66 0.72 0.71 16 0.97 0.78 0.91 0.73 0.67 0.70 0.79 0.80 0.80 32 0.95 1.17 0.98 0.87 0.82 0.86 0.89 0.90 0.90 64 0.96 0.95 1.01 1.01 0.98 1.00 1.03 0.99 0.99 128 0.99 1.01 1.01 1.17 1.08 1.12 1.02 0.97 1.02 non-critical-length: 32 1 2 4 8 16 32 64 112 140 0 1.03 0.97 0.75 0.65 0.58 0.58 0.56 0.70 0.70 1 0.94 0.95 0.76 0.65 0.58 0.58 0.61 0.71 0.72 2 0.97 0.96 0.77 0.66 0.58 0.59 0.62 0.74 0.74 4 0.99 0.96 0.78 0.66 0.60 0.61 0.66 0.76 0.77 8 0.99 0.99 0.84 0.70 0.64 0.66 0.71 0.80 0.80 16 0.98 0.97 0.95 0.76 0.70 0.73 0.81 0.85 0.84 32 1.04 1.12 1.04 0.89 0.82 0.86 0.93 0.91 0.91 64 0.99 1.15 1.07 1.00 0.99 1.01 1.05 0.99 0.99 128 1.00 1.21 1.20 1.22 1.25 1.31 1.12 1.10 0.99 non-critical-length: 128 1 2 4 8 16 32 64 112 140 0 1.02 1.00 0.99 0.67 0.61 0.61 0.61 0.74 0.73 1 0.95 0.99 1.00 0.68 0.61 0.60 0.60 0.74 0.74 2 1.00 1.04 1.00 0.68 0.59 0.61 0.65 0.76 0.76 4 1.00 0.96 0.98 0.70 0.63 0.63 0.67 0.78 0.77 8 1.01 1.02 0.89 0.73 0.65 0.67 0.71 0.81 0.80 16 0.99 0.96 0.96 0.79 0.71 0.73 0.80 0.84 0.84 32 0.99 0.95 1.05 0.89 0.84 0.85 0.94 0.92 0.91 64 1.00 0.99 1.16 1.04 1.00 1.02 1.06 0.99 0.99 128 1.00 1.06 0.98 1.14 1.39 1.26 1.08 1.02 0.98 There is regression in large critical section. But adaptive mutex is aimed for "quick" locks. Small critical section is more common when users choose to use adaptive pthread_mutex. Signed-off-by: Wangyang Guo <wangyang.guo@intel.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-05-09Linux: Implement a useful version of _startup_fatalFlorian Weimer3-19/+65
On i386 and ia64, the TCB is not available at this point. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2022-05-09ia64: Always define IA64_USE_NEW_STUB as a flag macroFlorian Weimer2-13/+15
And keep the previous definition if it exists. This allows disabling IA64_USE_NEW_STUB while keeping USE_DL_SYSINFO defined. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2022-05-06linux: Fix posix_spawn return code if clone fails (BZ#29109)Adhemerval Zanella1-1/+1
The __clone_internal returns the error on errno. Checked on x86_64-linux-gnu.
2022-05-05clock_adjtime: Use __nonnull to avoid null pointerXiaoming Ni2-3/+3
clock_adjtime()/clock_adjtime64() Add __nonnull((2)) to avoid null pointer access. Link: https://sourceware.org/bugzilla/show_bug.cgi?id=27662 Link: https://sourceware.org/bugzilla/show_bug.cgi?id=29084 Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2022-05-05ntp_xxxtimex: Use __nonnull to avoid null pointerXiaoming Ni2-8/+8
ntp_gettime() ntp_gettime64() ntp_gettimex() ntp_gettimex64() ntp_adjtime() Add __nonnull((1)) to avoid null pointer access. Link: https://sourceware.org/bugzilla/show_bug.cgi?id=27662 Link: https://sourceware.org/bugzilla/show_bug.cgi?id=29084 Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2022-05-05adjtimex/adjtimex64: Use __nonnull to avoid null pointerXiaoming Ni2-4/+4
Add __nonnull((1)) to the adjtimex()/adjtimex64() function declaration to avoid null pointer access. Link: https://sourceware.org/bugzilla/show_bug.cgi?id=27662 Link: https://sourceware.org/bugzilla/show_bug.cgi?id=29084 Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2022-05-05hurd spawni: Fix reauthenticating closed fdsSamuel Thibault1-1/+1
When an fd is closed, the port cell remains, but the port becomes MACH_PORT_NULL, so we have to guard against it.
2022-05-04Linux: Define MMAP_CALL_INTERNALFlorian Weimer3-12/+30
Unlike MMAP_CALL, this avoids a TCB dependency for an errno update on failure. <mmap_internal.h> cannot be included as is on several architectures due to the definition of page_unit, so introduce a separate header file for the definition of MMAP_CALL and MMAP_CALL_INTERNAL, <mmap_call.h>. Reviewed-by: Stefan Liebler <stli@linux.ibm.com>
2022-05-04i386: Honor I386_USE_SYSENTER for 6-argument Linux system callsFlorian Weimer3-3/+37
Introduce an int-80h-based version of __libc_do_syscall and use it if I386_USE_SYSENTER is defined as 0. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-05-04i386: Remove OPTIMIZE_FOR_GCC_5 from Linux libc-do-syscall.SFlorian Weimer1-3/+0
After commit a78e6a10d0b50d0ca80309775980fc99944b1727 ("i386: Remove broken CAN_USE_REGISTER_ASM_EBP (bug 28771)"), it is never defined. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-05-02powerpc32: Remove unused HAVE_PPC_SECURE_PLTFangrui Song2-41/+0
82a79e7d1843f9d90075a0bf2f04557040829bb0 removed the only user of HAVE_PPC_SECURE_PLT. Reviewed-by: Florian Weimer <fweimer@redhat.com>
2022-04-29benchtests: Better libmvec integrationSiddhesh Poyarekar1-4/+0
Improve libmvec benchmark integration so that in future other architectures may be able to run their libmvec benchmarks as well. This now allows libmvec benchmarks to be run with `make BENCHSET=bench-math`. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2022-04-29benchtests: Add UNSUPPORTED benchmark statusSiddhesh Poyarekar1-6/+6
The libmvec benchmarks print a message indicating that a certain CPU feature is unsupported and exit prematurelyi, which breaks the JSON in bench.out. Handle this more elegantly in the bench makefile target by adding support for an UNSUPPORTED exit status (77) so that bench.out continues to have output for valid tests. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2022-04-28linux: Fix fchmodat with AT_SYMLINK_NOFOLLOW for 64 bit time_t (BZ#29097)Adhemerval Zanella1-2/+2
The AT_SYMLINK_NOFOLLOW emulation ues the default 32 bit stat internal calls, which fails with EOVERFLOW if the file constains timestamps beyond 2038. Checked on i686-linux-gnu.
2022-04-27sysdeps: Add 'get_fast_jitter' interace in fast-jitter.hNoah Goldstein1-0/+42
'get_fast_jitter' is meant to be used purely for performance purposes. In all cases it's used it should be acceptable to get no randomness (see default case). An example use case is in setting jitter for retries between threads at a lock. There is a performance benefit to having jitter, but only if the jitter can be generated very quickly and ultimately there is no serious issue if no jitter is generated. The implementation generally uses 'HP_TIMING_NOW' iff it is inlined (avoid any potential syscall paths). Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-04-27posix/glob.c: update from gnulibDJ Delorie1-0/+1
Copied from gnulib/lib/glob.c in order to fix rhbz 1982608 Also fixes swbz 25659 Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>
2022-04-27linux: Fix missing internal 64 bit time_t stat usageAdhemerval Zanella2-4/+4
These are two missing spots initially done by 52a5fe70a2c77935. Checked on i686-linux-gnu.
2022-04-26posix: Remove unused definition on _ForkAdhemerval Zanella1-3/+0
Checked on x86_64-linux-gnu.
2022-04-26elf: Replace PI_STATIC_AND_HIDDEN with opposite HIDDEN_VAR_NEEDS_DYNAMIC_RELOCFangrui Song39-78/+42
PI_STATIC_AND_HIDDEN indicates whether accesses to internal linkage variables and hidden visibility variables in a shared object (ld.so) need dynamic relocations (usually R_*_RELATIVE). PI (position independent) in the macro name is a misnomer: a code sequence using GOT is typically position-independent as well, but using dynamic relocations does not meet the requirement. Not defining PI_STATIC_AND_HIDDEN is legacy and we expect that all new ports will define PI_STATIC_AND_HIDDEN. Current ports defining PI_STATIC_AND_HIDDEN are more than the opposite. Change the configure default. No functional change. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2022-04-26i386: Regenerate ulpsCarlos O'Donell2-2/+2
These failures were caught while building glibc master for Fedora Rawhide which is built with '-mtune=generic -msse2 -mfpmath=sse' using gcc 11.3 (gcc-11.3.1-2.fc35) on a Cascadelake Intel Xeon processor.
2022-04-25elf: Remove unused enum allowmaskFangrui Song1-11/+0
Unused since 52a01100ad011293197637e42b5be1a479a2f4ae ("elf: Remove ad-hoc restrictions on dlopen callers [BZ #22787]"). Reviewed-by: Florian Weimer <fweimer@redhat.com>
2022-04-22x86: Optimize {str|wcs}rchr-evexNoah Goldstein1-181/+290
The new code unrolls the main loop slightly without adding too much overhead and minimizes the comparisons for the search CHAR. Geometric Mean of all benchmarks New / Old: 0.755 See email for all results. Full xcheck passes on x86_64 with and without multiarch enabled. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-04-22x86: Optimize {str|wcs}rchr-avx2Noah Goldstein1-157/+269
The new code unrolls the main loop slightly without adding too much overhead and minimizes the comparisons for the search CHAR. Geometric Mean of all benchmarks New / Old: 0.832 See email for all results. Full xcheck passes on x86_64 with and without multiarch enabled. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-04-22x86: Optimize {str|wcs}rchr-sse2Noah Goldstein4-444/+339
The new code unrolls the main loop slightly without adding too much overhead and minimizes the comparisons for the search CHAR. Geometric Mean of all benchmarks New / Old: 0.741 See email for all results. Full xcheck passes on x86_64 with and without multiarch enabled. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-04-22x86-64: Fix SSE2 memcmp and SSSE3 memmove for x32H.J. Lu2-0/+8
Clear the upper 32 bits in RDX (memory size) for x32 to fix FAIL: string/tst-size_t-memcmp FAIL: string/tst-size_t-memcmp-2 FAIL: string/tst-size_t-memcpy FAIL: wcsmbs/tst-size_t-wmemcmp on x32 introduced by 8804157ad9 x86: Optimize memcmp SSE2 in memcmp.S 26b2478322 x86: Reduce code size of mem{move|pcpy|cpy}-ssse3 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-04-22Default to --with-default-link=no (bug 25812)Florian Weimer1-0/+6
This is necessary to place the libio vtables into the RELRO segment. New tests elf/tst-relro-ldso and elf/tst-relro-libc are added to verify that this is what actually happens. The new tests fail on ia64 due to lack of (default) RELRO support inbutils, so they are XFAILed there.
2022-04-20m68k: Handle fewer relocations for RTLD_BOOTSTRAP (#BZ29071)Fangrui Song1-6/+6
m68k is a non-PI_STATIC_AND_HIDDEN arch which uses a GOT relocation when loading the address of a jump table. The GOT load may be reordered before processing R_68K_RELATIVE relocations, leading to an unrelocated/incorrect jump table, which will cause a crash. The foolproof approach is to add an optimization barrier (e.g. calling an non-inlinable function after relative relocations are resolved). That is non-trivial given the current code structure, so just use the simple approach to avoid the jump table: handle only the essential reloctions for RTLD_BOOTSTRAP code. This is based on Andreas Schwab's patch and fixed ld.so crash on m68k. Reviewed-by: Adheemrval Zanella <adhemerval.zanella@linaro.org>
2022-04-19x86: Fix missing __wmemcmp def for disable-multiarch buildNoah Goldstein2-8/+6
commit 8804157ad9da39631703b92315460808eac86b0c Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Fri Apr 15 12:27:59 2022 -0500 x86: Optimize memcmp SSE2 in memcmp.S Only defined wmemcmp and missed __wmemcmp. This commit fixes that by defining __wmemcmp and setting wmemcmp as a weak alias to __wmemcmp. Both multiarch and disable-multiarch builds succeed and full xchecks pass. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-04-19elf: Remove __libc_init_secureFangrui Song4-82/+0
After 73fc4e28b9464f0e13edc719a5372839970e7ddb, __libc_enable_secure_decided is always 0 and a statically linked executable may overwrite __libc_enable_secure without considering AT_SECURE. The __libc_enable_secure has been correctly initialized in _dl_aux_init, so just remove __libc_enable_secure_decided and __libc_init_secure. This allows us to remove some startup_get*id functions from 22b79ed7f413cd980a7af0cf258da5bf82b6d5e5. Reviewed-by: Florian Weimer <fweimer@redhat.com>
2022-04-18mips: Fix mips64n32 64 bit time_t stat support (BZ#29069)=Joshua Kinard1-15/+23
Add missing support initially added by 4e8521333bea6e89fcef1020 (which missed n32 stat).
2022-04-15x86: Cleanup page cross code in memcmp-avx2-movbe.SNoah Goldstein1-37/+61
Old code was both inefficient and wasted code size. New code (-62 bytes) and comparable or better performance in the page cross case. geometric_mean(N=20) of page cross cases New / Original: 0.960 size, align0, align1, ret, New Time/Old Time 1, 4095, 0, 0, 1.001 1, 4095, 0, 1, 0.999 1, 4095, 0, -1, 1.0 2, 4094, 0, 0, 1.0 2, 4094, 0, 1, 1.0 2, 4094, 0, -1, 1.0 3, 4093, 0, 0, 1.0 3, 4093, 0, 1, 1.0 3, 4093, 0, -1, 1.0 4, 4092, 0, 0, 0.987 4, 4092, 0, 1, 1.0 4, 4092, 0, -1, 1.0 5, 4091, 0, 0, 0.984 5, 4091, 0, 1, 1.002 5, 4091, 0, -1, 1.005 6, 4090, 0, 0, 0.993 6, 4090, 0, 1, 1.001 6, 4090, 0, -1, 1.003 7, 4089, 0, 0, 0.991 7, 4089, 0, 1, 1.0 7, 4089, 0, -1, 1.001 8, 4088, 0, 0, 0.875 8, 4088, 0, 1, 0.881 8, 4088, 0, -1, 0.888 9, 4087, 0, 0, 0.872 9, 4087, 0, 1, 0.879 9, 4087, 0, -1, 0.883 10, 4086, 0, 0, 0.878 10, 4086, 0, 1, 0.886 10, 4086, 0, -1, 0.873 11, 4085, 0, 0, 0.878 11, 4085, 0, 1, 0.881 11, 4085, 0, -1, 0.879 12, 4084, 0, 0, 0.873 12, 4084, 0, 1, 0.889 12, 4084, 0, -1, 0.875 13, 4083, 0, 0, 0.873 13, 4083, 0, 1, 0.863 13, 4083, 0, -1, 0.863 14, 4082, 0, 0, 0.838 14, 4082, 0, 1, 0.869 14, 4082, 0, -1, 0.877 15, 4081, 0, 0, 0.841 15, 4081, 0, 1, 0.869 15, 4081, 0, -1, 0.876 16, 4080, 0, 0, 0.988 16, 4080, 0, 1, 0.99 16, 4080, 0, -1, 0.989 17, 4079, 0, 0, 0.978 17, 4079, 0, 1, 0.981 17, 4079, 0, -1, 0.98 18, 4078, 0, 0, 0.981 18, 4078, 0, 1, 0.98 18, 4078, 0, -1, 0.985 19, 4077, 0, 0, 0.977 19, 4077, 0, 1, 0.979 19, 4077, 0, -1, 0.986 20, 4076, 0, 0, 0.977 20, 4076, 0, 1, 0.986 20, 4076, 0, -1, 0.984 21, 4075, 0, 0, 0.977 21, 4075, 0, 1, 0.983 21, 4075, 0, -1, 0.988 22, 4074, 0, 0, 0.983 22, 4074, 0, 1, 0.994 22, 4074, 0, -1, 0.993 23, 4073, 0, 0, 0.98 23, 4073, 0, 1, 0.992 23, 4073, 0, -1, 0.995 24, 4072, 0, 0, 0.989 24, 4072, 0, 1, 0.989 24, 4072, 0, -1, 0.991 25, 4071, 0, 0, 0.99 25, 4071, 0, 1, 0.999 25, 4071, 0, -1, 0.996 26, 4070, 0, 0, 0.993 26, 4070, 0, 1, 0.995 26, 4070, 0, -1, 0.998 27, 4069, 0, 0, 0.993 27, 4069, 0, 1, 0.999 27, 4069, 0, -1, 1.0 28, 4068, 0, 0, 0.997 28, 4068, 0, 1, 1.0 28, 4068, 0, -1, 0.999 29, 4067, 0, 0, 0.996 29, 4067, 0, 1, 0.999 29, 4067, 0, -1, 0.999 30, 4066, 0, 0, 0.991 30, 4066, 0, 1, 1.001 30, 4066, 0, -1, 0.999 31, 4065, 0, 0, 0.988 31, 4065, 0, 1, 0.998 31, 4065, 0, -1, 0.998 Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-04-15x86: Remove memcmp-sse4.SNoah Goldstein4-813/+0
Code didn't actually use any sse4 instructions since `ptest` was removed in: commit 2f9062d7171850451e6044ef78d91ff8c017b9c0 Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Wed Nov 10 16:18:56 2021 -0600 x86: Shrink memcmp-sse4.S code size The new memcmp-sse2 implementation is also faster. geometric_mean(N=20) of page cross cases SSE2 / SSE4: 0.905 Note there are two regressions preferring SSE2 for Size = 1 and Size = 65. Size = 1: size, align0, align1, ret, New Time/Old Time 1, 1, 1, 0, 1.2 1, 1, 1, 1, 1.197 1, 1, 1, -1, 1.2 This is intentional. Size == 1 is significantly less hot based on profiles of GCC11 and Python3 than sizes [4, 8] (which is made hotter). Python3 Size = 1 -> 13.64% Python3 Size = [4, 8] -> 60.92% GCC11 Size = 1 -> 1.29% GCC11 Size = [4, 8] -> 33.86% size, align0, align1, ret, New Time/Old Time 4, 4, 4, 0, 0.622 4, 4, 4, 1, 0.797 4, 4, 4, -1, 0.805 5, 5, 5, 0, 0.623 5, 5, 5, 1, 0.777 5, 5, 5, -1, 0.802 6, 6, 6, 0, 0.625 6, 6, 6, 1, 0.813 6, 6, 6, -1, 0.788 7, 7, 7, 0, 0.625 7, 7, 7, 1, 0.799 7, 7, 7, -1, 0.795 8, 8, 8, 0, 0.625 8, 8, 8, 1, 0.848 8, 8, 8, -1, 0.914 9, 9, 9, 0, 0.625 Size = 65: size, align0, align1, ret, New Time/Old Time 65, 0, 0, 0, 1.103 65, 0, 0, 1, 1.216 65, 0, 0, -1, 1.227 65, 65, 0, 0, 1.091 65, 0, 65, 1, 1.19 65, 65, 65, -1, 1.215 This is because A) the checks in range [65, 96] are now unrolled 2x and B) because smaller values <= 16 are now given a hotter path. By contrast the SSE4 version has a branch for Size = 80. The unrolled version has get better performance for returns which need both comparisons. size, align0, align1, ret, New Time/Old Time 128, 4, 8, 0, 0.858 128, 4, 8, 1, 0.879 128, 4, 8, -1, 0.888 As well, out of microbenchmark environments that are not full predictable the branch will have a real-cost. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-04-15x86: Optimize memcmp SSE2 in memcmp.SNoah Goldstein8-376/+575
New code save size (-303 bytes) and has significantly better performance. geometric_mean(N=20) of page cross cases New / Original: 0.634 Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-04-15stdio: Split __get_errname definition from errlist.cAdhemerval Zanella1-0/+21
The loader does not need to pull all __get_errlist definitions and its size is decreased: Before: $ size elf/ld.so text data bss dec hex filename 197774 11024 456 209254 33166 elf/ld.so After: $ size elf/ld.so text data bss dec hex filename 191510 9936 456 201902 314ae elf/ld.so Checked on x86_64-linux-gnu.