aboutsummaryrefslogtreecommitdiff
path: root/sysdeps/x86
AgeCommit message (Collapse)AuthorFilesLines
3 daysx86: Don't use asm statement for trunc/truncfH.J. Lu3-12/+91
Compiler inlines trunc and truncf with SSE4.1. But older versions of GCC doesn't inline them with -Os: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121861 Don't use asm statement for trunc and truncf if compiler can inline them with -Os. It removes one register move with GCC 16: __modff_sse41: __modff_sse41: .LFB23: .LFB23: .cfi_startproc .cfi_startproc endbr64 endbr64 subq $24, %rsp subq $24, %rsp .cfi_def_cfa_offset 32 .cfi_def_cfa_offset 32 movq %fs:40, %rax movq %fs:40, %rax movq %rax, 8(%rsp) movq %rax, 8(%rsp) xorl %eax, %eax xorl %eax, %eax movd %xmm0, %eax movd %xmm0, %eax addl %eax, %eax addl %eax, %eax cmpl $-16777216, %eax cmpl $-16777216, %eax je .L7 je .L7 > movaps %xmm0, %xmm3 movaps %xmm0, %xmm4 movaps %xmm0, %xmm4 movss .LC0(%rip), %xmm2 | movss .LC0(%rip), %xmm1 movaps %xmm2, %xmm3 | movaps %xmm1, %xmm2 andps %xmm0, %xmm2 | roundss $11, %xmm3, %xmm3 roundss $11, %xmm0, %xmm1 | subss %xmm3, %xmm4 subss %xmm1, %xmm4 | andps %xmm0, %xmm1 andnps %xmm4, %xmm3 | andnps %xmm4, %xmm2 orps %xmm3, %xmm2 | orps %xmm2, %xmm1 .L3: .L3: movss %xmm1, (%rdi) | movss %xmm3, (%rdi) movq 8(%rsp), %rax movq 8(%rsp), %rax subq %fs:40, %rax subq %fs:40, %rax jne .L8 jne .L8 movaps %xmm2, %xmm0 | movaps %xmm1, %xmm0 addq $24, %rsp addq $24, %rsp .cfi_remember_state .cfi_remember_state .cfi_def_cfa_offset 8 .cfi_def_cfa_offset 8 ret ret Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Uros Bizjak <ubizjak@gmail.com>
9 daysmath: Fix x86_64 build for -Os (BZ 33367)Adhemerval Zanella1-0/+27
The compiler might not inline the trunc function call for USE_TRUNC_BUILTIN [1]. This patch adds an optimized __trunc/__truncf for x86 used on modf ifunc variant to avoid the trunc libcall. Checked on x86_64, x86_64-v2, x86_64-v3, and x86_64-v4. Used -O2 and -Os options. Performed a full make check on x86_64 with both optimizations. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121861 Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
10 daysx86: Remove x86 version of thread_pointer.hUros Bizjak1-30/+0
The x86 version of thread_pointer.h is the same as the generic one. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: H.J.Lu <hjl.tools@gmail.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Carlos O'Donell <carlos@redhat.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
10 daysx86: Remove stale __GNUC_PREREQ (11, 1) test from __thread_pointer()Uros Bizjak1-10/+0
GCC 12 is currently the minimum supported compiler version. Remove no longer needed __GNUC_PREREQ (11, 1) test from __thread_pointer(). Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: H.J.Lu <hjl.tools@gmail.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Carlos O'Donell <carlos@redhat.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
11 daysx86: Define atomic_compare_and_exchange_{val, bool}_acq using ↵Uros Bizjak1-4/+14
__atomic_compare_exchange_n No functional changes. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Cc: Wilco Dijkstra <Wilco.Dijkstra@arm.com> Cc: Collin Funk <collin.funk1@gmail.com> Cc: H.J.Lu <hjl.tools@gmail.com> Cc: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
11 daysx86: Define atomic_exchange_acq using __atomic_exchange_nUros Bizjak1-28/+1
The resulting libc.so is identical on both x86_64 and i386 targets compared to unpatched builds: $ sha1sum libc-x86_64-old.so libc-x86_64-new.so 74eca1b87f2ecc9757a984c089a582b7615d93e7 libc-x86_64-old.so 74eca1b87f2ecc9757a984c089a582b7615d93e7 libc-x86_64-new.so $ sha1sum libc-i386-old.so libc-i386-new.so 882bbab8324f79f4fbc85224c4c914fc6822ece7 libc-i386-old.so 882bbab8324f79f4fbc85224c4c914fc6822ece7 libc-i386-new.so Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Cc: Wilco Dijkstra <Wilco.Dijkstra@arm.com> Cc: Collin Funk <collin.funk1@gmail.com> Cc: H.J.Lu <hjl.tools@gmail.com> Cc: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
11 daysx86: Define atomic_full_barrier using __sync_synchronizeUros Bizjak1-6/+2
For x86_64 targets, __sync_synchronize emits a full 64-bit 'LOCK ORQ $0x0,(%rsp)' instead of 'LOCK ORL $0x0,(%rsp)'. No functional changes. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Cc: Wilco Dijkstra <Wilco.Dijkstra@arm.com> Cc: Collin Funk <collin.funk1@gmail.com> Cc: H.J.Lu <hjl.tools@gmail.com> Cc: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
11 daysx86: Remove catomic_* locking primitivesUros Bizjak1-108/+4
Remove obsolete catomic_* locking primitives which don't map to standard compiler builtins. There are still a couple of places in the tree that uses them (malloc/arena.c and malloc/malloc.c). x86 didn't define __arch_c_compare_and_exchange_bool_* primitives so fallback code used __arch_c_compare_and_exchange_val_* primitives instead. This resulted in unoptimal code for catomic_compare_and_exchange_bool_acq where superfluous CMP was emitted after CMPXCHG, e.g. in arena_get2: 775b8: 48 8d 4a 01 lea 0x1(%rdx),%rcx 775bc: 48 89 d0 mov %rdx,%rax 775bf: 64 83 3c 25 18 00 00 cmpl $0x0,%fs:0x18 775c6: 00 00 775c8: 74 01 je 775cb <arena_get2+0x35b> 775ca: f0 48 0f b1 0d 75 3d lock cmpxchg %rcx,0x163d75(%rip) # 1db348 <narenas> 775d1: 16 00 775d3: 48 39 c2 cmp %rax,%rdx 775d6: 74 7f je 77657 <arena_get2+0x3e7> that now becomes: 775b8: 48 8d 4a 01 lea 0x1(%rdx),%rcx 775bc: 48 89 d0 mov %rdx,%rax 775bf: f0 48 0f b1 0d 80 3d lock cmpxchg %rcx,0x163d80(%rip) # 1db348 <narenas> 775c6: 16 00 775c8: 74 7f je 77649 <arena_get2+0x3d9> OTOH, catomic_decrement does not fallback to atomic_fetch_add (, -1) builtin but to the cmpxchg loop, so the generated code in arena_get2 regresses a bit, from using LOCK DECQ insn: 77829: 64 83 3c 25 18 00 00 cmpl $0x0,%fs:0x18 77830: 00 00 77832: 74 01 je 77835 <arena_get2+0x5c5> 77834: f0 48 ff 0d 0c 3b 16 lock decq 0x163b0c(%rip) # 1db348 <narenas> 7783b: 00 to a cmpxchg loop: 7783d: 48 8b 0d 04 3b 16 00 mov 0x163b04(%rip),%rcx # 1db348 <narenas> 77844: 48 8d 71 ff lea -0x1(%rcx),%rsi 77848: 48 89 c8 mov %rcx,%rax 7784b: f0 48 0f b1 35 f4 3a lock cmpxchg %rsi,0x163af4(%rip) # 1db348 <narenas> 77852: 16 00 77854: 0f 84 c9 fa ff ff je 77323 <arena_get2+0xb3> 7785a: eb e1 jmp 7783d <arena_get2+0x5cd> Defining catomic_exchange_and_add using __atomic_fetch_add solves the above issue and generates optimal: 77809: f0 48 83 2d 36 3b 16 lock subq $0x1,0x163b36(%rip) # 1db348 <narenas> 77810: 00 01 Depending on the target processor, the compiler may emit either 'LOCK ADD/SUB $1, m' or 'INC/DEC $1, m' instruction, due to partial flag register stall issue. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Cc: Wilco Dijkstra <Wilco.Dijkstra@arm.com> Cc: Collin Funk <collin.funk1@gmail.com> Cc: H.J.Lu <hjl.tools@gmail.com> Cc: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
11 daysx86: Remove unused atomicsUros Bizjak1-353/+0
Remove unused atomics from <sysdeps/x86/atomic-machine.h>. The resulting libc.so is identical on both x86_64 and i386 targets compared to unpatched builds: $ sha1sum libc-x86_64-old.so libc-x86_64-new.so b89aaa2b71efd435104ebe6f4cd0f2ef89fcac90 libc-x86_64-old.so b89aaa2b71efd435104ebe6f4cd0f2ef89fcac90 libc-x86_64-new.so $ sha1sum libc-i386-old.so libc-i386-new.so aa70f2d64da2f0f516634b116014cfe7af3e5b1a libc-i386-old.so aa70f2d64da2f0f516634b116014cfe7af3e5b1a libc-i386-new.so Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Cc: Wilco Dijkstra <Wilco.Dijkstra@arm.com> Cc: Collin Funk <collin.funk1@gmail.com> Cc: H.J.Lu <hjl.tools@gmail.com> Cc: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
12 daysx86: Include <bits/stdlib-bsearch.h> in dl-cacheinfo.hH.J. Lu1-2/+13
On x86-64, when glibc is configured with --enable-stack-protector=all and compiled with -Os, ld.so crashes very early: (gdb) r --direct Starting program: /export/build/gnu/tools-build/glibc-gitlab/build-x86_64-linux/string/test-memswap --direct Program received signal SIGSEGV, Segmentation fault. 0x00007ffff7f41b0a in bsearch (__key=__key@entry=0x7fffffffda28, __base=__base@entry=0x7ffff7fca140 <intel_02_known>, __nmemb=__nmemb@entry=68, __size=__size@entry=8, __compar=__compar@entry=0x7ffff7f3b691 <intel_02_known_compare>) at ../bits/stdlib-bsearch.h:22 22 { (gdb) disass Dump of assembler code for function bsearch: 0x00007ffff7f41af0 <+0>: push %r15 0x00007ffff7f41af2 <+2>: mov %rcx,%r15 0x00007ffff7f41af5 <+5>: push %r14 0x00007ffff7f41af7 <+7>: push %r13 0x00007ffff7f41af9 <+9>: mov %rsi,%r13 0x00007ffff7f41afc <+12>: push %r12 0x00007ffff7f41afe <+14>: mov %rdi,%r12 0x00007ffff7f41b01 <+17>: push %rbp 0x00007ffff7f41b02 <+18>: mov %rdx,%rbp 0x00007ffff7f41b05 <+21>: push %rbx 0x00007ffff7f41b06 <+22>: sub $0x18,%rsp => 0x00007ffff7f41b0a <+26>: mov %fs:0x28,%r14 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We can't use stack protector at this point. 0x00007ffff7f41b13 <+35>: mov %r14,0x8(%rsp) 0x00007ffff7f41b18 <+40>: mov %r8,%r14 0x00007ffff7f41b1b <+43>: test %rbp,%rbp 0x00007ffff7f41b1e <+46>: je 0x7ffff7f41b48 <bsearch+88> 0x00007ffff7f41b20 <+48>: mov %rbp,%rbx 0x00007ffff7f41b23 <+51>: mov %r12,%rdi 0x00007ffff7f41b26 <+54>: shr $1,%rbx 0x00007ffff7f41b29 <+57>: imul %r15,%rbx 0x00007ffff7f41b2d <+61>: add %r13,%rbx 0x00007ffff7f41b30 <+64>: mov %rbx,%rsi (gdb) bt #0 0x00007ffff7f41b0a in bsearch (__key=__key@entry=0x7fffffffda28, __base=__base@entry=0x7ffff7fca140 <intel_02_known>, __nmemb=__nmemb@entry=68, __size=__size@entry=8, __compar=__compar@entry=0x7ffff7f3b691 <intel_02_known_compare>) at ../bits/stdlib-bsearch.h:22 #1 0x00007ffff7f3c1be in intel_check_word (name=188, value=1979933440, has_level_2=has_level_2@entry=0x7fffffffda7f, no_level_2_or_3=no_level_2_or_3@entry=0x7fffffffda7e, cpu_features=<optimized out>) at ../sysdeps/x86/dl-cacheinfo.h:217 #2 0x00007ffff7f3c29f in handle_intel (name=name@entry=188, cpu_features=<optimized out>) at ../sysdeps/x86/dl-cacheinfo.h:279 #3 0x00007ffff7f3ccf9 in dl_init_cacheinfo (cpu_features=<optimized out>) at ../sysdeps/x86/dl-cacheinfo.h:852 #4 init_cpu_features (cpu_features=<optimized out>) at ../sysdeps/x86/cpu-features.c:1153 #5 0x00007ffff7f3d6f9 in __libc_start_main_impl (main=0x7ffff7f396dc <main>, argc=2, argv=0x7fffffffdbe8, init=<optimized out>, fini=<optimized out>, rtld_fini=0x0, stack_end=0x7fffffffdbd8) at ../csu/libc-start.c:269 #6 0x00007ffff7f39901 in _start () at ../sysdeps/x86_64/start.S:115 (gdb) The problem is that since __USE_EXTERN_INLINES isn't defined with -Os, the inline bsearch in <bits/stdlib-bsearch.h> isn't available and the external bsearch is compiled with stack protector. Include <bits/stdlib-bsearch.h> in dl-cacheinfo.h fixed BZ #33374. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>
2025-08-29x86: Use flag output operands for inline asm in atomic-machine.hUros Bizjak1-48/+48
Use the flag output constraints feature available in gcc 6+ ("=@cc<cond>") instead of explicitly setting a boolean variable with SETcc instruction. This approach decouples the instruction that sets the flags from the code that consumes them, allowing the compiler to create better code when working with flags users. Instead of e.g.: lock add %esi,(%rdi) sets %sil test %sil,%sil jne <...> the compiler now generates: lock add %esi,(%rdi) js <...> No functional changes intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: H.J.Lu <hjl.tools@gmail.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-08-27x86/configure: Improve portability of isa level checkHenrik Lindström2-2/+2
wc -l pads the output with leading spaces on some systems, e.g. FreeBSD. This results in the check `test "$count" = 1` failing. Use -eq for integer comparison instead. Signed-off-by: Henrik Lindström <henrik@lxm.se> Reviewed-by: Arjun Shankar <arjun@redhat.com>
2025-08-23Don't pass -c to LIBC_TRY_TEST_CC_OPTIONH.J. Lu2-2/+2
LIBC_TRY_TEST_CC_OPTION is defined with LIBC_TRY_CC_OPTION: dnl Test a compiler option or options with an empty input file. dnl LIBC_TRY_CC_OPTION([options], [action-if-true], [action-if-false]) AC_DEFUN([LIBC_TRY_CC_OPTION], [AS_IF([AC_TRY_COMMAND([${CC-cc} $1 -xc /dev/null -S -o /dev/null])], [$2], [$3])]) which passes -S to compiler. Unlike gcc, when -c is also passed to clang 20, we get configure:7838: clang -c -Werror -fsemantic-interposition -xc /dev/null -S -o /dev/null clang: error: argument unused during compilation: '-c' [-Werror,-Wunused-command-line-argument] Don't pass -c to LIBC_TRY_TEST_CC_OPTION since -c isn't needed. This fixes BZ #33318. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>
2025-08-22x86: Set have-protected-data to no if unsupportedH.J. Lu3-0/+207
If the building compiler enables no direct external data access by default, access to protected data in shared libraries from executables must be compiled with no direct external data access. If the testing compiler doesn't support it, set have-protected-data to no to disable the tests which requires no direct external data access. Add LIBC_TRY_CC_COMMAND to test a building compiler option or options with an input file. This fixes BZ #33286. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>
2025-08-18i386: Also add GLIBC_ABI_GNU2_TLS version [BZ #33129]H.J. Lu2-0/+14
Since the GNU2 TLS run-time bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31372 affects both i386 and x86-64, also add GLIBC_ABI_GNU2_TLS version to i386 to indicate the working GNU2 TLS run-time. For x86-64, the additional GNU2 TLS run-time bug fix is needed for https://sourceware.org/bugzilla/show_bug.cgi?id=31501 Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>
2025-07-18x86-64: Properly compile ISA optimized modf and modffH.J. Lu2-0/+6
There are 3 variants of modf and modff: SSE2, SSE4.1 and AVX. s_modf.c and s_modff.c include the generic implementation compiled with the minimum x86 ISA level. The IFUNC selector is used only if the minimum ISA level is less than AVX. SSE4.1 variant is included only if the ISA level is less than SSE4.1. AVX variant is included only the ISA level is less than AVX. AVX variant should be compiled with -mavx, not -msse2avx -DSSE2AVX which are used to encode SSE assembly sources with EVEX encoding. The routines that are shared between libc and libm should use different rules to avoid using the same MODULE_NAME, to avoid potential issues like BZ #33165 where __stack_chk_fail not being routed to the internal symbol. Tested with -march=x86-64, -march=x86-64-v2, -march=x86-64-v3 and -march=x86-64-v4. This fixes BZ #33165 and BZ #33173. Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2025-07-09x86: Avoid vector/r16-r31 registers and memcpy/memset in mcount_internalH.J. Lu1-0/+4
Since mcount_internal is called from mcount/__fentry__ which preserve only RAX, RCX, RDX, RSI, RDI, R8 and R9, compile mcount.c with -fno-tree-loop-distribute-patterns -mgeneral-regs-only -mno-apxf to void vector/r16-r31 registers and memcpy/memset in mcount_internal. This fixes BZ #33134. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Andreas K. Huettel <dilfridge@gentoo.org>
2025-06-19x86: Update tst-gnu2-tls2 testsH.J. Lu6-22/+73
Update tst-gnu2-tls2 tests to set XMM0...XMM7 to all 1s in malloc to verify that XMM registers are preserved when _dl_tlsdesc_dynamic is called by clearing vectors with zeroed XMM registers before _dl_tlsdesc_dynamic and using these XMM registers to clear vectors after _dl_tlsdesc_dynamic. This improves the BZ #31372 test. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>
2025-06-19i386: Update ___tls_get_addr to preserve vector registersH.J. Lu4-1/+95
Compiler generates the following instruction sequence for dynamic TLS access: leal tls_var@tlsgd(,%ebx,1), %eax call ___tls_get_addr@PLT CALL instruction is transparent to compiler which assumes all registers, except for EFLAGS, AX, CX, and DX, are unchanged after CALL. But ___tls_get_addr is a normal function which doesn't preserve any vector registers. 1. Rename the generic __tls_get_addr function to ___tls_get_addr_internal. 2. Change ___tls_get_addr to a wrapper function with implementations for FNSAVE, FXSAVE, XSAVE and XSAVEC to save and restore all vector registers. 3. dl-tlsdesc-dynamic.h has: _dl_tlsdesc_dynamic: /* Like all TLS resolvers, preserve call-clobbered registers. We need two scratch regs anyway. */ subl $32, %esp cfi_adjust_cfa_offset (32) It is wrong to use movl %ebx, -28(%esp) movl %esp, %ebx cfi_def_cfa_register(%ebx) ... mov %ebx, %esp cfi_def_cfa_register(%esp) movl -28(%esp), %ebx to preserve EBX on stack. Fix it with: movl %ebx, 28(%esp) movl %esp, %ebx cfi_def_cfa_register(%ebx) ... mov %ebx, %esp cfi_def_cfa_register(%esp) movl 28(%esp), %ebx 4. Update _dl_tlsdesc_dynamic to call ___tls_get_addr_internal directly. 5. Add have-test-mtls-traditional to compile tst-tls23-mod.c with traditional TLS variant to verify the fix. 6. Define DL_RUNTIME_RESOLVE_REALIGN_STACK in sysdeps/x86/sysdep.h. This fixes BZ #32996. Co-Authored-By: Adhemerval Zanella <adhemerval.zanella@linaro.org> Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2025-06-09x86: Avoid GLRO(dl_x86_cpu_features)H.J. Lu1-1/+1
In init_cpu_features, replace GLRO(dl_x86_cpu_features) with cpu_features to avoid an extra load. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-04-12x86: Detect Intel Diamond RapidsH.J. Lu1-0/+12
Detect Intel Diamond Rapids and tune it similar to Intel Granite Rapids. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
2025-04-11x86: Handle unknown Intel processor with default tuningSunil K Pandey1-144/+143
Enable default tuning for unknown Intel processor. Tested on x86, no regression. Co-Authored-By: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-04-10x86: Add ARL/PTL/CWF model detection supportSunil K Pandey1-0/+10
- Add ARROWLAKE model detection. - Add PANTHERLAKE model detection. - Add CLEARWATERFOREST model detection. Intel® Architecture Instruction Set Extensions Programming Reference https://cdrdv2.intel.com/v1/dl/getContent/671368 Section 1.2. No regression, validated model detection on SDE. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-04-05x86: Optimize xstate size calculationSunil K Pandey2-56/+24
Scan xstate IDs up to the maximum supported xstate ID. Remove the separate AMX xstate calculation. Instead, exclude the AMX space from the start of TILECFG to the end of TILEDATA in xsave_state_size. Completed validation on SKL/SKX/SPR/SDE and compared xsave state size with "ld.so --list-diagnostics" option, no regression. Co-Authored-By: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
2025-03-31x86: Link tst-gnu2-tls2-x86-noxsave{,c,xsavec} with libpthreadFlorian Weimer1-0/+3
This fixes a test build failure on Hurd. Fixes commit 145097dff170507fe73190e8e41194f5b5f7e6bf ("x86: Use separate variable for TLSDESC XSAVE/XSAVEC state size (bug 32810)"). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2025-03-31Fix typo in commentYLK1-1/+1
2025-03-29x86: Use separate variable for TLSDESC XSAVE/XSAVEC state size (bug 32810)Florian Weimer8-7/+39
Previously, the initialization code reused the xsave_state_full_size member of struct cpu_features for the TLSDESC state size. However, the tunable processing code assumes that this member has the original XSAVE (non-compact) state size, so that it can use its value if XSAVEC is disabled via tunable. This change uses a separate variable and not a struct member because the value is only needed in ld.so and the static libc, but not in libc.so. As a result, struct cpu_features layout does not change, helping a future backport of this change. Fixes commit 9b7091415af47082664717210ac49d51551456ab ("x86-64: Update _dl_tlsdesc_dynamic to preserve AMX registers"). Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-03-29x86: Skip XSAVE state size reset if ISA level requires XSAVEFlorian Weimer1-0/+5
If we have to use XSAVE or XSAVEC trampolines, do not adjust the size information they need. Technically, it is an operator error to try to run with -XSAVE,-XSAVEC on such builds, but this change here disables some unnecessary code with higher ISA levels and simplifies testing. Related to commit befe2d3c4dec8be2cdd01a47132e47bdb7020922 ("x86-64: Don't use SSE resolvers for ISA level 3 or above"). Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-03-05Remove dl-procinfo.hAdhemerval Zanella1-3/+0
powerpc was the only architecture with arch-specific hooks for LD_SHOW_AUXV, and with the information moved to ld diagnostics there is no need to keep the _dl_procinfo hook. Checked with a build for all affected ABIs. Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2025-02-28Remove unused dl-procinfo.hWilco Dijkstra3-49/+0
Remove unused _dl_hwcap_string defines. As a result many dl-procinfo.h headers can be removed. This also removes target specific _dl_procinfo implementations which only printed HWCAP strings using dl_hwcap_string. Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-02-23math: Fix `unknown type name '__float128'` for clang 3.4 to 3.8.1 (bug 32694)koraynilay1-2/+2
When compiling a program that includes <bits/floatn.h> using a clang version between 3.4 (included) and 3.8.1 (included), clang will fail with `unknown type name '__float128'; did you mean '__cfloat128'?`. This changes fixes the clang prerequirements macro call in floatn.h to check for clang 3.9 instead of 3.4, since support for __float128 was actually enabled in 3.9 by: commit 50f29e06a1b6a38f0bba9360cbff72c82d46cdd4 Author: Nemanja Ivanovic <nemanja.i.ibm@gmail.com> Date: Wed Apr 13 09:49:45 2016 +0000 Enable support for __float128 in Clang This fixes bug 32694. Signed-off-by: koraynilay <koray.fra@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-02-20x86 (__HAVE_FLOAT128): Defined to 0 for Intel SYCL compiler [BZ #32723]H.J. Lu1-2/+6
Intel compiler always defines __INTEL_LLVM_COMPILER. When SYCL is enabled by -fsycl, it also defines SYCL_LANGUAGE_VERSION. Since Intel SYCL compiler doesn't support _Float128: https://github.com/intel/llvm/issues/16903 define __HAVE_FLOAT128 to 0 for Intel SYCL compiler. This fixes BZ #32723. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>
2025-01-09x86: Add missing #include <features.h> to <thread_pointer.h>Florian Weimer1-0/+2
It is required for __GNUC_PREREQ. Reviewed-by: Michael Jeanson <mjeanson@efficios.com>
2025-01-09Move <thread_pointer.h> to kernel-independent sysdeps directoriesFlorian Weimer1-0/+0
Hurd is expected to use the same thread ABI as Linux. Reviewed-by: Michael Jeanson <mjeanson@efficios.com>
2025-01-01Update copyright dates with scripts/update-copyrightsPaul Eggert107-107/+107
2024-12-23include/sys/cdefs.h: Add __attribute_optimization_barrier__Adhemerval Zanella12-25/+25
Add __attribute_optimization_barrier__ to disable inlining and cloning on a function. For Clang, expand it to __attribute__ ((optnone)) Otherwise, expand it to __attribute__ ((noinline, clone)) Co-Authored-By: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>
2024-12-22x86: Define __HAVE_FLOAT128 for Clang and use __builtin_*f128 code pathFangrui Song1-8/+16
Clang supports __builtin_fabsf128 (despite not supporting _Float128) but it does not support __builtin_fabsq. Fallback to back to `typedef __float128 _Float128;` it clang is used. Originally developed by Fangrui Song <maskray@google.com>. Co-Authored-By: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>
2024-12-22x86: Use inhibit_stack_protector on tst-ifunc-isa.hAdhemerval Zanella1-2/+3
Co-Authored-By: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>
2024-12-22x86: Include test-flt-eval-method-387 if -mfpmath=387 worksH.J. Lu3-1/+47
Since Clang doesn't support -mfpmath=387 on x86-64, on x86, include test-flt-eval-method-387 only if -mfpmath=387 works. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>
2024-12-20elf: Introduce is_rtld_link_mapFlorian Weimer1-1/+1
Unconditionally define it to false for static builds. This avoids the awkward use of weak_extern for _dl_rtld_map in checks that cannot be possibly true on static builds. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-12-18sys/platform/x86.h: Do not depend on _Bool definition in C++ modeH.J. Lu2-3/+3
Clang does not define _Bool for -std=c++98: /usr/include/bits/platform/features.h:31:19: error: unknown type name '_Bool' 31 | static __inline__ _Bool | ^ Change _Bool to bool to silence clang++ error. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-12-17x86: Avoid integer truncation with large cache sizes (bug 32470)Florian Weimer1-2/+2
Some hypervisors report 1 TiB L3 cache size. This results in some variables incorrectly getting zeroed, causing crashes in memcpy/memmove because invariants are violated.
2024-12-16Fix sysdeps/x86/fpu/Makefile: Split and sort testsH.J. Lu1-1/+2
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-12-16sysdeps/x86/fpu/Makefile: Split and sort testsH.J. Lu1-2/+7
Split and sort tests in sysdeps/x86/fpu/Makefile. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-11-25Silence most -Wzero-as-null-pointer-constant diagnosticsAlejandro Colomar1-1/+1
Replace 0 by NULL and {0} by {}. Omit a few cases that aren't so trivial to fix. Link: <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117059> Link: <https://software.codidact.com/posts/292718/292759#answer-292759> Signed-off-by: Alejandro Colomar <alx@kernel.org>
2024-08-26x86: Enable non-temporal memset for Hygon processorsFeifei Wang2-3/+8
This patch uses 'Avoid_Non_Temporal_Memset' flag to access the non-temporal memset implementation for hygon processors. Test Results: hygon1 arch x86_memset_non_temporal_threshold = 8MB size new performance time / old performance time 1MB 0.994 4MB 0.996 8MB 0.670 16MB 0.343 32MB 0.355 hygon2 arch x86_memset_non_temporal_threshold = 8MB size new performance time / old performance time 1MB 1 4MB 1 8MB 1.312 16MB 0.822 32MB 0.830 hygon3 arch x86_memset_non_temporal_threshold = 8MB size new performance time / old performance time 1MB 1 4MB 0.990 8MB 0.737 16MB 0.390 32MB 0.401 For hygon arch with this patch, non-temporal stores can improve performance by 20% - 65%. Signed-off-by: Feifei Wang <wangfeifei@hygon.cn> Reviewed-by: Jing Li <lijing@hygon.cn> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-08-26x86: Add cache information support for Hygon processorsFeifei Wang1-0/+60
Add hygon branch in dl_init_cacheinfo function to initialize cache size variables for hygon processors. In the meanwhile, add handle_hygon() function to get cache information. Signed-off-by: Feifei Wang <wangfeifei@hygon.cn> Reviewed-by: Jing Li <lijing@hygon.cn> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-08-26x86: Add new architecture type for Hygon processorsFeifei Wang2-3/+17
Add a new architecture type arch_kind_hygon to spilt Hygon branch from AMD. This is to facilitate the Hygon processors to make settings that are suitable for its own characteristics. Signed-off-by: Feifei Wang <wangfeifei@hygon.cn> Reviewed-by: Jing Li <lijing@hygon.cn> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-08-15x86: Add `Avoid_STOSB` tunable to allow NT memset without ERMSNoah Goldstein5-7/+40
The goal of this flag is to allow targets which don't prefer/have ERMS to still access the non-temporal memset implementation. There are 4 cases for tuning memset: 1) `Avoid_STOSB && Avoid_Non_Temporal_Memset` - Memset with temporal stores 2) `Avoid_STOSB && !Avoid_Non_Temporal_Memset` - Memset with temporal/non-temporal stores. Non-temporal path goes through `rep stosb` path. We accomplish this by setting `x86_rep_stosb_threshold` to `x86_memset_non_temporal_threshold`. 3) `!Avoid_STOSB && Avoid_Non_Temporal_Memset` - Memset with temporal stores/`rep stosb` 3) `!Avoid_STOSB && !Avoid_Non_Temporal_Memset` - Memset with temporal stores/`rep stosb`/non-temporal stores. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-08-15x86: Use `Avoid_Non_Temporal_Memset` to control non-temporal pathNoah Goldstein2-8/+23
This is just a refactor and there should be no behavioral change from this commit. The goal is to make `Avoid_Non_Temporal_Memset` a more universal knob for controlling whether we use non-temporal memset rather than having extra logic based on vendor. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>