aboutsummaryrefslogtreecommitdiff
path: root/sysdeps/x86
AgeCommit message (Collapse)AuthorFilesLines
2024-01-15x86-64: Check if mprotect works before rewriting PLTH.J. Lu1-1/+7
Systemd execution environment configuration may prohibit changing a memory mapping to become executable: MemoryDenyWriteExecute= Takes a boolean argument. If set, attempts to create memory mappings that are writable and executable at the same time, or to change existing memory mappings to become executable, or mapping shared memory segments as executable, are prohibited. When it is set, systemd service stops working if PLT rewrite is enabled. Check if mprotect works before rewriting PLT. This fixes BZ #31230. This also works with SELinux when deny_execmem is on. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2024-01-11x86-64/cet: Make CET feature check specific to Linux/x86H.J. Lu5-33/+35
CET feature bits in TCB, which are Linux specific, are used to check if CET features are active. Move CET feature check to Linux/x86 directory. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-01-10i386: Remove CET support bitsH.J. Lu5-135/+3
1. Remove _dl_runtime_resolve_shstk and _dl_runtime_profile_shstk. 2. Move CET offsets from x86 cpu-features-offsets.sym to x86-64 features-offsets.sym. 3. Rename x86 cet-control.h to x86-64 feature-control.h since it is only for x86-64 and also used for PLT rewrite. 4. Add x86-64 ldsodefs.h to include feature-control.h. 5. Change TUNABLE_CALLBACK (set_plt_rewrite) to x86-64 only. 6. Move x86 dl-procruntime.c to x86-64. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-01-10x86-64/cet: Move check-cet.awk to x86_64H.J. Lu1-53/+0
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-01-10x86-64/cet: Move dl-cet.[ch] to x86_64 directoriesH.J. Lu1-364/+0
Since CET is only enabled for x86-64, move dl-cet.[ch] to x86_64 directories. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-01-10x86: Move x86-64 shadow stack startup codesH.J. Lu1-74/+0
Move sysdeps/x86/libc-start.h to sysdeps/x86_64/libc-start.h and use sysdeps/generic/libc-start.h for i386. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-01-09i386: Remove CET supportAdhemerval Zanella1-44/+0
CET is only support for x86_64, this patch reverts: - faaee1f07ed x86: Support shadow stack pointer in setjmp/longjmp. - be9ccd27c09 i386: Add _CET_ENDBR to indirect jump targets in add_n.S/sub_n.S - c02695d7764 x86/CET: Update vfork to prevent child return - 5d844e1b725 i386: Enable CET support in ucontext functions - 124bcde683 x86: Add _CET_ENDBR to functions in crti.S - 562837c002 x86: Add _CET_ENDBR to functions in dl-tlsdesc.S - f753fa7dea x86: Support IBT and SHSTK in Intel CET [BZ #21598] - 825b58f3fb i386-mcount.S: Add _CET_ENDBR to _mcount and __fentry__ - 7e119cd582 i386: Use _CET_NOTRACK in i686/memcmp.S - 177824e232 i386: Use _CET_NOTRACK in memcmp-sse4.S - 0a899af097 i386: Use _CET_NOTRACK in memcpy-ssse3-rep.S - 7fb613361c i386: Use _CET_NOTRACK in memcpy-ssse3.S - 77a8ae0948 i386: Use _CET_NOTRACK in memset-sse2-rep.S - 00e7b76a8f i386: Use _CET_NOTRACK in memset-sse2.S - 90d15dc577 i386: Use _CET_NOTRACK in strcat-sse2.S - f1574581c7 i386: Use _CET_NOTRACK in strcpy-sse2.S - 4031d7484a i386/sub_n.S: Add a missing _CET_ENDBR to indirect jump - target - Checked on i686-linux-gnu.
2024-01-09x86: Move CET infrastructure to x86_64Adhemerval Zanella53-1501/+0
The CET is only supported for x86_64 and there is no plan to add kernel support for i386. Move the Makefile rules and files from the generic x86 folder to x86_64 one. Checked on x86_64-linux-gnu and i686-linux-gnu.
2024-01-08Remove ia64-linux-gnuAdhemerval Zanella1-13/+0
Linux 6.7 removed ia64 from the official tree [1], following the general principle that a glibc port needs upstream support for the architecture in all the components it depends on (binutils, GCC, and the Linux kernel). Apart from the removal of sysdeps/ia64 and sysdeps/unix/sysv/linux/ia64, there are updates to various comments referencing ia64 for which removal of those references seemed appropriate. The configuration is removed from README and build-many-glibcs.py. The CONTRIBUTED-BY, elf/elf.h, manual/contrib.texi (the porting mention), *.po files, config.guess, and longlong.h are not changed. For Linux it allows cleanup some clone2 support on multiple files. The following bug can be closed as WONTFIX: BZ 22634 [2], BZ 14250 [3], BZ 21634 [4], BZ 10163 [5], BZ 16401 [6], and BZ 11585 [7]. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=43ff221426d33db909f7159fdf620c3b052e2d1c [2] https://sourceware.org/bugzilla/show_bug.cgi?id=22634 [3] https://sourceware.org/bugzilla/show_bug.cgi?id=14250 [4] https://sourceware.org/bugzilla/show_bug.cgi?id=21634 [5] https://sourceware.org/bugzilla/show_bug.cgi?id=10163 [6] https://sourceware.org/bugzilla/show_bug.cgi?id=16401 [7] https://sourceware.org/bugzilla/show_bug.cgi?id=11585 Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2024-01-05elf: Add ELF_DYNAMIC_AFTER_RELOC to rewrite PLTH.J. Lu4-1/+38
Add ELF_DYNAMIC_AFTER_RELOC to allow target specific processing after relocation. For x86-64, add #define DT_X86_64_PLT (DT_LOPROC + 0) #define DT_X86_64_PLTSZ (DT_LOPROC + 1) #define DT_X86_64_PLTENT (DT_LOPROC + 3) 1. DT_X86_64_PLT: The address of the procedure linkage table. 2. DT_X86_64_PLTSZ: The total size, in bytes, of the procedure linkage table. 3. DT_X86_64_PLTENT: The size, in bytes, of a procedure linkage table entry. With the r_addend field of the R_X86_64_JUMP_SLOT relocation set to the memory offset of the indirect branch instruction. Define ELF_DYNAMIC_AFTER_RELOC for x86-64 to rewrite the PLT section with direct branch after relocation when the lazy binding is disabled. PLT rewrite is disabled by default since SELinux may disallow modifying code pages and ld.so can't detect it in all cases. Use $ export GLIBC_TUNABLES=glibc.cpu.plt_rewrite=1 to enable PLT rewrite with 32-bit direct jump at run-time or $ export GLIBC_TUNABLES=glibc.cpu.plt_rewrite=2 to enable PLT rewrite with 32-bit direct jump and on APX processors with 64-bit absolute jump at run-time. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-01-04x86-64/cet: Check the restore token in longjmpH.J. Lu1-0/+3
setcontext and swapcontext put a restore token on the old shadow stack which is used to restore the target shadow stack when switching user contexts. When longjmp from a user context, the target shadow stack can be different from the current shadow stack and INCSSP can't be used to restore the shadow stack pointer to the target shadow stack. Update longjmp to search for a restore token. If found, use the token to restore the shadow stack pointer before using INCSSP to pop the shadow stack. Stop the token search and use INCSSP if the shadow stack entry value is the same as the current shadow stack pointer. It is a user error if there is a shadow stack switch without leaving a restore token on the old shadow stack. The only difference between __longjmp.S and __longjmp_chk.S is that __longjmp_chk.S has a check for invalid longjmp usages. Merge __longjmp.S and __longjmp_chk.S by adding the CHECK_INVALID_LONGJMP macro. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-01-04i386: Ignore --enable-cetH.J. Lu2-113/+0
Since shadow stack is only supported for x86-64, ignore --enable-cet for i386. Always setting $(enable-cet) for i386 to "no" to support ifneq ($(enable-cet),no) in x86 Makefiles. We can't use ifeq ($(enable-cet),yes) since $(enable-cet) can be "yes", "no" or "permissive". Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-01-01x86/cet: Add -fcf-protection=none before -fcf-protection=branchH.J. Lu1-4/+4
When shadow stack is enabled, some CET tests failed when compiled with GCC 14: FAIL: elf/tst-cet-legacy-4 FAIL: elf/tst-cet-legacy-5a FAIL: elf/tst-cet-legacy-6a which are caused by https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113039 These tests use -fcf-protection -fcf-protection=branch and assume that -fcf-protection=branch will override -fcf-protection. But this GCC 14 commit: https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=1c6231c05bdcca changed the -fcf-protection behavior such that -fcf-protection -fcf-protection=branch is treated the same as -fcf-protection Use -fcf-protection -fcf-protection=none -fcf-protection=branch as the workaround. This fixes BZ #31187. Tested with GCC 13 and GCC 14 on Intel Tiger Lake. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-01-01Update copyright dates with scripts/update-copyrightsPaul Eggert136-136/+136
2024-01-01x86/cet: Run some CET tests with shadow stackH.J. Lu4-0/+17
When CET is disabled by default, run some CET tests with shadow stack enabled using $ export GLIBC_TUNABLES=glibc.cpu.hwcaps=SHSTK
2024-01-01x86/cet: Don't set CET active by defaultH.J. Lu2-2/+15
Not all CET enabled applications and libraries have been properly tested in CET enabled environments. Some CET enabled applications or libraries will crash or misbehave when CET is enabled. Don't set CET active by default so that all applications and libraries will run normally regardless of whether CET is active or not. Shadow stack can be enabled by $ export GLIBC_TUNABLES=glibc.cpu.hwcaps=SHSTK at run-time if shadow stack can be enabled by kernel. NB: This commit can be reverted if it is OK to enable CET by default for all applications and libraries.
2024-01-01x86/cet: Check feature_1 in TCB for active IBT and SHSTKH.J. Lu3-1/+35
Initially, IBT and SHSTK are marked as active when CPU supports them and CET are enabled in glibc. They can be disabled early by tunables before relocation. Since after relocation, GLRO(dl_x86_cpu_features) becomes read-only, we can't update GLRO(dl_x86_cpu_features) to mark IBT and SHSTK as inactive. Instead, check the feature_1 field in TCB to decide if IBT and SHST are active.
2024-01-01x86/cet: Enable shadow stack during startupH.J. Lu6-93/+93
Previously, CET was enabled by kernel before passing control to user space and the startup code must disable CET if applications or shared libraries aren't CET enabled. Since the current kernel only supports shadow stack and won't enable shadow stack before passing control to user space, we need to enable shadow stack during startup if the application and all shared library are shadow stack enabled. There is no need to disable shadow stack at startup. Shadow stack can only be enabled in a function which will never return. Otherwise, shadow stack will underflow at the function return. 1. GL(dl_x86_feature_1) is set to the CET features which are supported by the processor and are not disabled by the tunable. Only non-zero features in GL(dl_x86_feature_1) should be enabled. After enabling shadow stack with ARCH_SHSTK_ENABLE, ARCH_SHSTK_STATUS is used to check if shadow stack is really enabled. 2. Use ARCH_SHSTK_ENABLE in RTLD_START in dynamic executable. It is safe since RTLD_START never returns. 3. Call arch_prctl (ARCH_SHSTK_ENABLE) from ARCH_SETUP_TLS in static executable. Since the start function using ARCH_SETUP_TLS never returns, it is safe to enable shadow stack in ARCH_SETUP_TLS.
2024-01-01x86/cet: Sync with Linux kernel 6.6 shadow stack interfaceH.J. Lu2-6/+11
Sync with Linux kernel 6.6 shadow stack interface. Since only x86-64 is supported, i386 shadow stack codes are unchanged and CET shouldn't be enabled for i386. 1. When the shadow stack base in TCB is unset, the default shadow stack is in use. Use the current shadow stack pointer as the marker for the default shadow stack. It is used to identify if the current shadow stack is the same as the target shadow stack when switching ucontexts. If yes, INCSSP will be used to unwind shadow stack. Otherwise, shadow stack restore token will be used. 2. Allocate shadow stack with the map_shadow_stack syscall. Since there is no function to explicitly release ucontext, there is no place to release shadow stack allocated by map_shadow_stack in ucontext functions. Such shadow stacks will be leaked. 3. Rename arch_prctl CET commands to ARCH_SHSTK_XXX. 4. Rewrite the CET control functions with the current kernel shadow stack interface. Since CET is no longer enabled by kernel, a separate patch will enable shadow stack during startup.
2023-12-20x86/cet: Don't disable CET if not single threadedH.J. Lu1-2/+9
In permissive mode, don't disable IBT nor SHSTK when dlopening a legacy shared library if not single threaded since IBT and SHSTK may be still enabled in other threads. Other threads with IBT or SHSTK enabled will crash when calling functions in the legacy shared library. Instead, an error will be issued.
2023-12-20x86: Modularize sysdeps/x86/dl-cet.cH.J. Lu1-176/+280
Improve readability and make maintenance easier for dl-feature.c by modularizing sysdeps/x86/dl-cet.c: 1. Support processors with: a. Only IBT. Or b. Only SHSTK. Or c. Both IBT and SHSTK. 2. Lock CET features only if IBT or SHSTK are enabled and are not enabled permissively.
2023-12-19Fix elf: Do not duplicate the GLIBC_TUNABLES stringH.J. Lu1-1/+1
commit 2a969b53c0b02fed7e43473a92f219d737fd217a Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Wed Dec 6 10:24:01 2023 -0300 elf: Do not duplicate the GLIBC_TUNABLES string has @@ -38,7 +39,7 @@ which isn't available. */ #define CHECK_GLIBC_IFUNC_PREFERRED_OFF(f, cpu_features, name, len) \ _Static_assert (sizeof (#name) - 1 == len, #name " != " #len); \ - if (memcmp (f, #name, len) == 0) \ + if (tunable_str_comma_strcmp_cte (&f, #name) == 0) \ { \ cpu_features->preferred[index_arch_##name] \ &= ~bit_arch_##name; \ @@ -46,12 +47,11 @@ Fix it by removing "== 0" after tunable_str_comma_strcmp_cte.
2023-12-19Fix elf: Do not duplicate the GLIBC_TUNABLES stringH.J. Lu1-5/+5
Fix issues in sysdeps/x86/tst-hwcap-tunables.c added by Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Wed Dec 6 10:24:01 2023 -0300 elf: Do not duplicate the GLIBC_TUNABLES string 1. -AVX,-AVX2,-AVX512F should be used to disable AVX, AVX2 and AVX512. 2. AVX512 IFUNC functions check AVX512VL. -AVX512VL should be added to disable these functions. This fixed: FAIL: elf/tst-hwcap-tunables ... [0] Spawned test for -Prefer_ERMS,-Prefer_FSRM,-AVX,-AVX2,-AVX_Usable,-AVX2_Usable,-AVX512F_Usable,-SSE4_1,-SSE4_2,-SSSE3,-Fast_Unaligned_Load,-ERMS,-AVX_Fast_Unaligned_Load error: subprocess failed: tst-tunables error: unexpected output from subprocess ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false [1] Spawned test for ,-,-Prefer_ERMS,-Prefer_FSRM,-AVX,-AVX2,-AVX_Usable,-AVX2_Usable,-AVX512F_Usable,-SSE4_1,-SSE4_2,,-SSSE3,-Fast_Unaligned_Load,,-,-ERMS,-AVX_Fast_Unaligned_Load,-, error: subprocess failed: tst-tunables error: unexpected output from subprocess ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false error: 2 test failures on Intel Tiger Lake.
2023-12-19i686: Do not raise exception traps on fesetexcept (BZ 30989)Adhemerval Zanella1-18/+5
According to ISO C23 (7.6.4.4), fesetexcept is supposed to set floating-point exception flags without raising a trap (unlike feraiseexcept, which is supposed to raise a trap if feenableexcept was called with the appropriate argument). The flags can be set in the 387 unit or in the SSE unit. To set a flag, it is sufficient to do it in the SSE unit, because that is guaranteed to not trap. However, on i386 CPUs that have only a 387 unit, set the flags in the 387, as long as this cannot trap. Checked on i686-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2023-12-19elf: Do not duplicate the GLIBC_TUNABLES stringAdhemerval Zanella3-75/+193
The tunable parsing duplicates the tunable environment variable so it null-terminates each one since it simplifies the later parsing. It has the drawback of adding another point of failure (__minimal_malloc failing), and the memory copy requires tuning the compiler to avoid mem operations calls. The parsing now tracks the tunable start and its size. The dl-tunable-parse.h adds helper functions to help parsing, like a strcmp that also checks for size and an iterator for suboptions that are comma-separated (used on hwcap parsing by x86, powerpc, and s390x). Since the environment variable is allocated on the stack by the kernel, it is safe to keep the references to the suboptions for later parsing of string tunables (as done by set_hwcaps by multiple architectures). Checked on x86_64-linux-gnu, powerpc64le-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-12-19x86/cet: Check CPU_FEATURE_ACTIVE in permissive modeH.J. Lu2-0/+6
Verify that CPU_FEATURE_ACTIVE works properly in permissive mode.
2023-12-19x86/cet: Check legacy shadow stack code in .init_array sectionH.J. Lu11-0/+330
Verify that legacy shadow stack code in .init_array section in application and shared library, which are marked as shadow stack enabled, will trigger segfault.
2023-12-19x86/cet: Add tests for GLIBC_TUNABLES=glibc.cpu.hwcaps=-SHSTKH.J. Lu3-0/+28
Verify that GLIBC_TUNABLES=glibc.cpu.hwcaps=-SHSTK turns off shadow stack properly.
2023-12-19x86/cet: Check CPU_FEATURE_ACTIVE when CET is disabledH.J. Lu3-0/+9
Verify that CPU_FEATURE_ACTIVE (SHSTK) works properly when CET is disabled.
2023-12-19x86/cet: Check legacy shadow stack applicationsH.J. Lu6-0/+130
Add tests to verify that legacy shadow stack applications run properly when shadow stack is enabled in Linux kernel.
2023-12-18x86/cet: Don't assume that SHSTK implies IBTH.J. Lu3-11/+11
Since shadow stack (SHSTK) is enabled in the Linux kernel without enabling indirect branch tracking (IBT), don't assume that SHSTK implies IBT. Use "CPU_FEATURE_ACTIVE (IBT)" to check if IBT is active and "CPU_FEATURE_ACTIVE (SHSTK)" to check if SHSTK is active.
2023-12-17x86/cet: Check user_shstk in /proc/cpuinfoH.J. Lu1-1/+1
Linux kernel reports CPU shadow stack feature in /proc/cpuinfo as user_shstk, instead of shstk.
2023-12-11x86: Check PT_GNU_PROPERTY earlyH.J. Lu1-40/+80
The PT_GNU_PROPERTY segment is scanned before PT_NOTE. For binaries with the PT_GNU_PROPERTY segment, we can check it to avoid scan of the PT_NOTE segment. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-12-11sysdeps/x86/Makefile: Split and sort testsH.J. Lu1-32/+78
Put each test on a separate line and sort tests.
2023-11-21x86: Use dl-symbol-redir-ifunc.h on cpu-tunablesAdhemerval Zanella1-27/+12
The dl-symbol-redir-ifunc.h redirects compiler-generated libcalls to arch-specific memory implementations to avoid ifunc calls where it is not yet possible. The memcmp-isa-default-impl.h aims to fix the same issue by calling the specific memset implementation directly. Using the memcmp symbol directly allows the compiler to inline the memset calls (especially because _dl_tunable_set_hwcaps uses constants values), generating better code. Checked on x86_64-linux-gnu. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-09-29x86: Add support for AVX10 preset and vec size in cpu-featuresNoah Goldstein4-3/+71
This commit add support for the new AVX10 cpu features: https://cdrdv2-public.intel.com/784267/355989-intel-avx10-spec.pdf We add checks for: - `AVX10`: Check if AVX10 is present. - `AVX10_{X,Y,Z}MM`: Check if a given vec class has AVX10 support. `make check` passes and cpuid output was checked against GNR/DMR on an emulator.
2023-08-29x86: Check the lower byte of EAX of CPUID leaf 2 [BZ #30643]H.J. Lu1-18/+13
The old Intel software developer manual specified that the low byte of EAX of CPUID leaf 2 returned 1 which indicated the number of rounds of CPUDID leaf 2 was needed to retrieve the complete cache information. The newer Intel manual has been changed to that it should always return 1 and be ignored. If the lower byte isn't 1, CPUID leaf 2 can't be used. In this case, we ignore CPUID leaf 2 and use CPUID leaf 4 instead. If CPUID leaf 4 doesn't contain the cache information, cache information isn't available at all. This addresses BZ #30643.
2023-08-11x86: Fix incorrect scope of setting `shared_per_thread` [BZ# 30745]Noah Goldstein1-4/+3
The: ``` if (shared_per_thread > 0 && threads > 0) shared_per_thread /= threads; ``` Code was accidentally moved to inside the else scope. This doesn't match how it was previously (before af992e7abd). This patch fixes that by putting the division after the `else` block.
2023-08-06x86: Fix for cache computation on AMD legacy cpus.Sajan Karumanchi1-27/+199
Some legacy AMD CPUs and hypervisors have the _cpuid_ '0x8000_001D' set to Zero, thus resulting in zeroed-out computed cache values. This patch reintroduces the old way of cache computation as a fail-safe option to handle these exceptions. Fixed 'level4_cache_size' value through handle_amd(). Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com> Tested-by: Florian Weimer <fweimer@redhat.com>
2023-07-27<sys/platform/x86.h>: Add APX supportH.J. Lu4-0/+11
Add support for Intel Advanced Performance Extensions: https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html to <sys/platform/x86.h>.
2023-07-18[PATCH v1] x86: Use `3/4*sizeof(per-thread-L3)` as low bound for NT threshold.Noah Goldstein1-3/+12
On some machines we end up with incomplete cache information. This can make the new calculation of `sizeof(total-L3)/custom-divisor` end up lower than intended (and lower than the prior value). So reintroduce the old bound as a lower bound to avoid potentially regressing code where we don't have complete information to make the decision. Reviewed-by: DJ Delorie <dj@redhat.com>
2023-07-18x86: Fix slight bug in `shared_per_thread` cache size calculation.Noah Goldstein1-2/+2
After: ``` commit af992e7abdc9049714da76cae1e5e18bc4838fb8 Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Wed Jun 7 13:18:01 2023 -0500 x86: Increase `non_temporal_threshold` to roughly `sizeof_L3 / 4` ``` Split `shared` (cumulative cache size) from `shared_per_thread` (cache size per socket), the `shared_per_thread` *can* be slightly off from the previous calculation. Previously we added `core` even if `threads_l2` was invalid, and only used `threads_l2` to divide `core` if it was present. The changed version only included `core` if `threads_l2` was valid. This change restores the old behavior if `threads_l2` is invalid by adding the entire value of `core`. Reviewed-by: DJ Delorie <dj@redhat.com>
2023-07-17configure: Use autoconf 2.71Siddhesh Poyarekar1-46/+52
Bump autoconf requirement to 2.71 to allow regenerating configure on more recent distributions. autoconf 2.71 has been in Fedora since F36 and is the current version in Debian stable (bookworm). It appears to be current in Gentoo as well. All sysdeps configure and preconfigure scripts have also been regenerated; all changes are trivial transformations that do not affect functionality. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2023-06-26x86: Make dl-cache.h and readelflib.c not Linux-specificSergey Bugaev1-0/+90
These files could be useful to any port that wants to use ld.so.cache. Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2023-06-19Fix misspellings -- BZ 25337Paul Pluzhnikov2-2/+2
2023-06-12x86: Make the divisor in setting `non_temporal_threshold` cpu specificNoah Goldstein4-26/+51
Different systems prefer a different divisors. From benchmarks[1] so far the following divisors have been found: ICX : 2 SKX : 2 BWD : 8 For Intel, we are generalizing that BWD and older prefers 8 as a divisor, and SKL and newer prefers 2. This number can be further tuned as benchmarks are run. [1]: https://github.com/goldsteinn/memcpy-nt-benchmarks Reviewed-by: DJ Delorie <dj@redhat.com>
2023-06-12x86: Refactor Intel `init_cpu_features`Noah Goldstein1-81/+309
This patch should have no affect on existing functionality. The current code, which has a single switch for model detection and setting prefered features, is difficult to follow/extend. The cases use magic numbers and many microarchitectures are missing. This makes it difficult to reason about what is implemented so far and/or how/where to add support for new features. This patch splits the model detection and preference setting stages so that CPU preferences can be set based on a complete list of available microarchitectures, rather than based on model magic numbers. Reviewed-by: DJ Delorie <dj@redhat.com>
2023-06-12x86: Increase `non_temporal_threshold` to roughly `sizeof_L3 / 4`Noah Goldstein1-27/+43
Current `non_temporal_threshold` set to roughly '3/4 * sizeof_L3 / ncores_per_socket'. This patch updates that value to roughly 'sizeof_L3 / 4` The original value (specifically dividing the `ncores_per_socket`) was done to limit the amount of other threads' data a `memcpy`/`memset` could evict. Dividing by 'ncores_per_socket', however leads to exceedingly low non-temporal thresholds and leads to using non-temporal stores in cases where REP MOVSB is multiple times faster. Furthermore, non-temporal stores are written directly to main memory so using it at a size much smaller than L3 can place soon to be accessed data much further away than it otherwise could be. As well, modern machines are able to detect streaming patterns (especially if REP MOVSB is used) and provide LRU hints to the memory subsystem. This in affect caps the total amount of eviction at 1/cache_associativity, far below meaningfully thrashing the entire cache. As best I can tell, the benchmarks that lead this small threshold where done comparing non-temporal stores versus standard cacheable stores. A better comparison (linked below) is to be REP MOVSB which, on the measure systems, is nearly 2x faster than non-temporal stores at the low-end of the previous threshold, and within 10% for over 100MB copies (well past even the current threshold). In cases with a low number of threads competing for bandwidth, REP MOVSB is ~2x faster up to `sizeof_L3`. The divisor of `4` is a somewhat arbitrary value. From benchmarks it seems Skylake and Icelake both prefer a divisor of `2`, but older CPUs such as Broadwell prefer something closer to `8`. This patch is meant to be followed up by another one to make the divisor cpu-specific, but in the meantime (and for easier backporting), this patch settles on `4` as a middle-ground. Benchmarks comparing non-temporal stores, REP MOVSB, and cacheable stores where done using: https://github.com/goldsteinn/memcpy-nt-benchmarks Sheets results (also available in pdf on the github): https://docs.google.com/spreadsheets/d/e/2PACX-1vS183r0rW_jRX6tG_E90m9qVuFiMbRIJvi5VAE8yYOvEOIEEc3aSNuEsrFbuXw5c3nGboxMmrupZD7K/pubhtml Reviewed-by: DJ Delorie <dj@redhat.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2023-06-02Fix a few more typos I missed in previous round -- BZ 25337Paul Pluzhnikov1-1/+1
2023-05-30Fix misspellings in sysdeps/ -- BZ 25337Paul Pluzhnikov3-3/+3