Age | Commit message (Collapse) | Author | Files | Lines |
|
The generic implementation is slight more optimized than the powerpc
one, where it has a more optimized inf/nan check (by not using FP
unit checks, along with branch prediction hints), and removed one
branch by issuing trunc instead of a combination of floor/ceil (which
also generated less code).
On power10 with gcc 14.2.1:
reciprocal-throughput master patch difference
workload-0_1 1.1351 0.9067 20.12%
workload-1_maxint 1.4230 0.9040 36.47%
workload-maxint_maxfloat 1.5038 0.9076 39.65%
workload-integral 1.1280 0.9111 19.23%
latency master patch difference
workload-0_1 1.1440 2.7117 -137.03%
workload-1_maxint 4.0556 2.7070 33.25%
workload-maxint_maxfloat 3.2122 2.7164 15.43%
workload-integral 3.2381 2.7281 15.75%
Checked on powerpc64le-linux-gnu.
Reviewed-by: Sachin Monga <smonga@linux.ibm.com>
|
|
The generic implementation is slight more optimized than the powerpc
one, where it has a more optimized inf/nan check (by not using FP
unit checks, along with branch prediction hints), and removed one
branch by issuing trunc instead of a combination of floor/ceil (which
also generated less code).
On power10 with gcc 14.2.1:
reciprocal-throughput master patch difference
workload-0_1 1.5210 1.3942 8.34%
workload-1_maxint 2.0926 1.3940 33.38%
workload-maxint_maxfloat 1.7851 1.3940 21.91%
workload-integral 1.5216 1.3941 8.37%
latency master patch difference
workload-0_1 1.5928 2.6337 -65.35%
workload-1_maxint 3.2929 2.6337 20.02%
workload-maxint_maxfloat 1.9697 2.6341 -33.73%
workload-integral 2.0597 2.6337 -27.87%
Checked on powerpc64le-linux-gnu.
Reviewed-by: Sachin Monga <smonga@linux.ibm.com>
|
|
They were misattributed as POWER9 sources.
|
|
Now that we require at least binutils 2.39 the support for POWER9 and
POWER10 instructions can be assumed.
|
|
This reverts commit 3367d8e180848030d1646f088759f02b8dfe0d6f
Reason for revert: Power10 strcmp clobbers non-volatile vector
registers (Bug 33056)
Tested on ppc64le without regression.
|
|
This reverts commit b9182c793caa05df5d697427c0538936e6396d4b
Reason for revert: Power10 memchr clobbers v20 vector register
(Bug 33059)
This is not a security issue, unlike CVE-2025-5745 and
CVE-2025-5702.
Tested on ppc64le without regression.
|
|
(CVE-2025-5702)
This reverts commit 90bcc8721ef82b7378d2b080141228660e862d56
This change is in the chain of the final revert that fixes the CVE
i.e. 3367d8e180848030d1646f088759f02b8dfe0d6f
Reason for revert: Power10 strcmp clobbers non-volatile vector
registers (Bug 33056)
Tested on ppc64le with no regressions.
|
|
This reverts commit 23f0d81608d0ca6379894ef81670cf30af7fd081
Reason for revert: Power10 strncmp clobbers non-volatile vector
registers (Bug 33060)
Tested on ppc64le with no regressions.
|
|
It removes the wrapper by moving the error/EDOM handling to an
out-of-line implementation (__math_invalidf_i/__math_invalidf_li).
Also, __glibc_unlikely is used on errors case since it helps
code generation on recent gcc.
The code now builds to with gcc-14 on aarch64:
0000000000000000 <__ilogbf>:
0: 1e260000 fmov w0, s0
4: d3577801 ubfx x1, x0, #23, #8
8: 340000e1 cbz w1, 24 <__ilogbf+0x24>
c: 5101fc20 sub w0, w1, #0x7f
10: 7103fc3f cmp w1, #0xff
14: 54000040 b.eq 1c <__ilogbf+0x1c> // b.none
18: d65f03c0 ret
1c: 12b00000 mov w0, #0x7fffffff // #2147483647
20: 14000000 b 0 <__math_invalidf_i>
24: 53175800 lsl w0, w0, #9
28: 340000a0 cbz w0, 3c <__ilogbf+0x3c>
2c: 5ac01000 clz w0, w0
30: 12800fc1 mov w1, #0xffffff81 // #-127
34: 4b000020 sub w0, w1, w0
38: d65f03c0 ret
3c: 320107e0 mov w0, #0x80000001 // #-2147483647
40: 14000000 b 0 <__math_invalidf_i>
Some ABI requires additional adjustments:
* i386 and m68k requires to use the template version, since
both provide __ieee754_ilogb implementatations.
* loongarch uses a custom implementation as well.
* powerpc64le also has a custom implementation for POWER9, which
is also used for float and float128 version. The generic
e_ilogb.c implementation is moved on powerpc to keep the
current code as-is.
Checked on aarch64-linux-gnu and x86_64-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
It removes the wrapper by moving the error/EDOM handling to an
out-of-line implementation (__math_invalid_i/__math_invalid_li).
Also, __glibc_unlikely is used on errors case since it helps
code generation on recent gcc.
The code now builds to with gcc-14 on aarch64:
0000000000000000 <__ilogb>:
0: 9e660000 fmov x0, d0
4: d374f801 ubfx x1, x0, #52, #11
8: 340000e1 cbz w1, 24 <__ilogb+0x24>
c: 510ffc20 sub w0, w1, #0x3ff
10: 711ffc3f cmp w1, #0x7ff
14: 54000040 b.eq 1c <__ilogb+0x1c> // b.none
18: d65f03c0 ret
1c: 12b00000 mov w0, #0x7fffffff // #2147483647
20: 14000000 b 0 <__math_invalid_i>
24: d374cc00 lsl x0, x0, #12
28: b40000a0 cbz x0, 3c <__ilogb+0x3c>
2c: dac01000 clz x0, x0
30: 12807fc1 mov w1, #0xfffffc01 // #-1023
34: 4b000020 sub w0, w1, w0
38: d65f03c0 ret
3c: 320107e0 mov w0, #0x80000001 // #-2147483647
40: 14000000 b 0 <__math_invalid_i>
Some ABI requires additional adjustments:
* i386 and m68k requires to use the template version, since
both provide __ieee754_ilogb implementatations.
* loongarch uses a custom implementation as well.
* powerpc64le also has a custom implementation for POWER9, which
is also used for float and float128 version. The generic
e_ilogb.c implementation is moved on powerpc to keep the
current code as-is.
Checked on aarch64-linux-gnu and x86_64-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
Due to raising the minimum binutils version to >= 2.26, the configure
check for testing support of --update-section is not needed anymore.
Reviewed-by: Peter Bergner <bergner@tenstorrent.com>
|
|
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the compoundn functions, which compute (1+X) to the
power Y for integer Y (and X at least -1). The integer exponent has
type long long int in C23; it was intmax_t in TS 18661-4, and as with
other interfaces changed after their initial appearance in the TS, I
don't think we need to support the original version of the interface.
Note that these functions are "compoundn" with a trailing "n", *not*
"compound" (CORE-MATH has the wrong name, for example).
As with pown, I strongly encourage searching for worst cases for ulps
error for these implementations (necessarily non-exhaustively, given
the size of the input space). I also expect a custom implementation
for a given format could be much faster as well as more accurate (I
haven't tested or benchmarked the CORE-MATH implementation for
binary32); this is one of the more complicated and less efficient
functions to implement in a type-generic way.
As with exp2m1 and exp10m1, this showed up places where the
powerpc64le IFUNC setup is not as self-contained as one might hope (in
this case, without the changes specific to powerpc64le, there were
undefined references to __GI___expf128).
Tested for x86_64 and x86, and with build-many-glibcs.py.
|
|
These routines are not extensively used (gnulib documentation even
recommend use a replacement [1]), and there is already a POWER8
version that uses proper vectorized instructions.
[1] https://www.gnu.org/software/gnulib/manual/gnulib.html#C-strings
Checked with a build for some powerpc variations.
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
|
|
This is only needed for -mno-secure-plt, and this linkage mode
is not supported with powerpc64 and powerp64le.
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
|
|
Code used during early static startup in elf/dl-tls.c uses
__mempcpy.
Fixes commit cbd9fd236981717d3d4ee942986ea912e9707c32 ("Consolidate
TLS block allocation for static binaries with ld.so").
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
Unconditionally define it to false for static builds.
This avoids the awkward use of weak_extern for _dl_rtld_map
in checks that cannot be possibly true on static builds.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Fix a big-endian / non-ROP build failure caused by commit 4d9a4c02 when
building dl-trampoline.S.
Reported-by: Joseph Myers <josmyers@redhat.com>
|
|
Add ROP protection for the _dl_runtime_resolve and _dl_profile_resolve
functions.
|
|
Add ROP protection for the getcontext, setcontext, makecontext, swapcontext
and __sigsetjmp_symbol functions.
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
|
|
Add ROP protect instructions to strncpy and ppc-mount functions.
Modify FRAME_MIN_SIZE to 48 bytes for ELFv2 to reserve additional
16 bytes for ROP save slot and padding.
Signed-off-by: Sachin Monga <smonga@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
|
|
The ROP instructions were added in ISA 3.1 (ie, Power10), however they
were defined so that if executed on older cpus, they would behave as
nops. This allows us to emit them on older cpus and they'd just be
ignored, but if run on a Power10, then the binary would be ROP protected.
Hash instructions use negative offsets so the default position
of ROP pointer is FRAME_ROP_SAVE from caller's SP.
Modified FRAME_MIN_SIZE_PARM to 112 for ELFv2 to reserve
additional 16 bytes for ROP save slot and padding.
Signed-off-by: Sachin Monga <smonga@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
|
|
This patch adds an optimized strcat which makes use of the default
strcat function which calls the Power10 strcpy and strlen routines.
|
|
The ABI requires all stack frames be 16-byte aligned.
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
|
|
Several copies of the licenses in files contained whitespace related
problems. Two cases are addressed here, the first is two spaces
after a period which appears between "PURPOSE." and "See". The other
is a space after the last forward slash in the URL. Both issues are
corrected and the licenses now match the official textual description
of the license (and the other license in the sources).
Since these whitespaces changes do not alter the paragraph structure of
the license, nor create new sentences, they do not change the license.
|
|
This fixes several test failures:
=====FAIL: stdlib/tst-strtod1i.out=====
Locale tests
all OK
Locale tests
all OK
Locale tests
strtold("1,5") returns -6,38643e+367 and not 1,5
strtold("1.5") returns 1,5 and not 1
strtold("1.500") returns 1 and not 1500
strtold("36.893.488.147.419.103.232") returns 1500 and not 3,68935e+19
Locale tests
all OK
=====FAIL: stdlib/tst-strtod3.out=====
0: got wrong results -2.5937e+4826, expected 0
=====FAIL: stdlib/tst-strtod4.out=====
0: got wrong results -6,38643e+367, expected 0
1: got wrong results 0, expected 1e+06
2: got wrong results 1e+06, expected 10
=====FAIL: stdlib/tst-strtod5i.out=====
0: got wrong results -6,38643e+367, expected 0
2: got wrong results 0, expected -0
4: got wrong results -0, expected 0
5: got wrong results 0, expected -0
6: got wrong results -0, expected 0
7: got wrong results 0, expected -0
8: got wrong results -0, expected 0
9: got wrong results 0, expected -0
10: got wrong results -0, expected 0
11: got wrong results 0, expected -0
12: got wrong results -0, expected 0
13: got wrong results 0, expected -0
14: got wrong results -0, expected 0
15: got wrong results 0, expected -0
16: got wrong results -0, expected 0
17: got wrong results 0, expected -0
18: got wrong results -0, expected 0
20: got wrong results 0, expected -0
22: got wrong results -0, expected 0
23: got wrong results 0, expected -0
24: got wrong results -0, expected 0
25: got wrong results 0, expected -0
26: got wrong results -0, expected 0
27: got wrong results 0, expected -0
Fixes commit 3fc063dee01da4f80920a14b7db637c8501d6fd4
("Make __strtod_internal tests type-generic").
Suggested-by: Joseph Myers <josmyers@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
|
In __syscall_cancel_arch, there's a tail call to __syscall_do_cancel.
On P10, since the caller uses the TOC and the callee is using
PC-relative addressing, there's only a branch instruction with no NOPs
to restore the TOC, which causes the build error. The fix involves adding
the NOTOC directive to the branch instruction, informing the linker
not to generate a TOC stub, thus resolving the issue.
|
|
This patch modifies the current Power9 implementation of strcpy and
stpcpy to optimize it for Power9 and Power10.
No new Power10 instructions are used, so the original Power9 strcpy
is modified instead of creating a new implementation for Power10.
The changes also affect stpcpy, which uses the same implementation
with some additional code before returning.
Improvements compared to the old Power9 version:
Use simple comparisons for the first ~512 bytes:
The main loop is good for long strings, but comparing 16B each time is
better for shorter strings. After aligning the address to 16 bytes, we
unroll the loop four times, checking 128 bytes each time. There may be
some overlap with the main loop for unaligned strings, but it is better
for shorter strings.
Loop with 64 bytes for longer bytes:
Use 4 consecutive lxv/stxv instructions.
Showed an average improvement of 13%.
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
|
|
The current racy approach is to enable asynchronous cancellation
before making the syscall and restore the previous cancellation
type once the syscall returns, and check if cancellation has happen
during the cancellation entrypoint.
As described in BZ#12683, this approach shows 2 problems:
1. Cancellation can act after the syscall has returned from the
kernel, but before userspace saves the return value. It might
result in a resource leak if the syscall allocated a resource or a
side effect (partial read/write), and there is no way to program
handle it with cancellation handlers.
2. If a signal is handled while the thread is blocked at a cancellable
syscall, the entire signal handler runs with asynchronous
cancellation enabled. This can lead to issues if the signal
handler call functions which are async-signal-safe but not
async-cancel-safe.
For the cancellation to work correctly, there are 5 points at which the
cancellation signal could arrive:
[ ... )[ ... )[ syscall ]( ...
1 2 3 4 5
1. Before initial testcancel, e.g. [*... testcancel)
2. Between testcancel and syscall start, e.g. [testcancel...syscall start)
3. While syscall is blocked and no side effects have yet taken
place, e.g. [ syscall ]
4. Same as 3 but with side-effects having occurred (e.g. a partial
read or write).
5. After syscall end e.g. (syscall end...*]
And libc wants to act on cancellation in cases 1, 2, and 3 but not
in cases 4 or 5. For the 4 and 5 cases, the cancellation will eventually
happen in the next cancellable entrypoint without any further external
event.
The proposed solution for each case is:
1. Do a conditional branch based on whether the thread has received
a cancellation request;
2. It can be caught by the signal handler determining that the saved
program counter (from the ucontext_t) is in some address range
beginning just before the "testcancel" and ending with the
syscall instruction.
3. SIGCANCEL can be caught by the signal handler and determine that
the saved program counter (from the ucontext_t) is in the address
range beginning just before "testcancel" and ending with the first
uninterruptable (via a signal) syscall instruction that enters the
kernel.
4. In this case, except for certain syscalls that ALWAYS fail with
EINTR even for non-interrupting signals, the kernel will reset
the program counter to point at the syscall instruction during
signal handling, so that the syscall is restarted when the signal
handler returns. So, from the signal handler's standpoint, this
looks the same as case 2, and thus it's taken care of.
5. For syscalls with side-effects, the kernel cannot restart the
syscall; when it's interrupted by a signal, the kernel must cause
the syscall to return with whatever partial result is obtained
(e.g. partial read or write).
6. The saved program counter points just after the syscall
instruction, so the signal handler won't act on cancellation.
This is similar to 4. since the program counter is past the syscall
instruction.
So The proposed fixes are:
1. Remove the enable_asynccancel/disable_asynccancel function usage in
cancellable syscall definition and instead make them call a common
symbol that will check if cancellation is enabled (__syscall_cancel
at nptl/cancellation.c), call the arch-specific cancellable
entry-point (__syscall_cancel_arch), and cancel the thread when
required.
2. Provide an arch-specific generic system call wrapper function
that contains global markers. These markers will be used in
SIGCANCEL signal handler to check if the interruption has been
called in a valid syscall and if the syscalls has side-effects.
A reference implementation sysdeps/unix/sysv/linux/syscall_cancel.c
is provided. However, the markers may not be set on correct
expected places depending on how INTERNAL_SYSCALL_NCS is
implemented by the architecture. It is expected that all
architectures add an arch-specific implementation.
3. Rewrite SIGCANCEL asynchronous handler to check for both canceling
type and if current IP from signal handler falls between the global
markers and act accordingly.
4. Adjust libc code to replace LIBC_CANCEL_ASYNC/LIBC_CANCEL_RESET to
use the appropriate cancelable syscalls.
5. Adjust 'lowlevellock-futex.h' arch-specific implementations to
provide cancelable futex calls.
Some architectures require specific support on syscall handling:
* On i386 the syscall cancel bridge needs to use the old int80
instruction because the optimized vDSO symbol the resulting PC value
for an interrupted syscall points to an address outside the expected
markers in __syscall_cancel_arch. It has been discussed in LKML [1]
on how kernel could help userland to accomplish it, but afaik
discussion has stalled.
Also, sysenter should not be used directly by libc since its calling
convention is set by the kernel depending of the underlying x86 chip
(check kernel commit 30bfa7b3488bfb1bb75c9f50a5fcac1832970c60).
* mips o32 is the only kABI that requires 7 argument syscall, and to
avoid add a requirement on all architectures to support it, mips
support is added with extra internal defines.
Checked on aarch64-linux-gnu, arm-linux-gnueabihf, powerpc-linux-gnu,
powerpc64-linux-gnu, powerpc64le-linux-gnu, i686-linux-gnu, and
x86_64-linux-gnu.
[1] https://lkml.org/lkml/2016/3/8/1105
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
|
As discussed at the patch review meeting
Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
Reviewed-by: Simon Chopin <simon.chopin@canonical.com>
|
|
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the exp2m1 and exp10m1 functions (exp2(x)-1 and
exp10(x)-1, like expm1).
As with other such functions, these use type-generic templates that
could be replaced with faster and more accurate type-specific
implementations in future. Test inputs are copied from those for
expm1, plus some additions close to the overflow threshold (copied
from exp2 and exp10) and also some near the underflow threshold.
exp2m1 has the unusual property of having an input (M_MAX_EXP) where
whether the function overflows (under IEEE semantics) depends on the
rounding mode. Although these could reasonably be XFAILed in the
testsuite (as we do in some cases for arguments very close to a
function's overflow threshold when an error of a few ulps in the
implementation can result in the implementation not agreeing with an
ideal one on whether overflow takes place - the testsuite isn't smart
enough to handle this automatically), since these functions aren't
required to be correctly rounding, I made the implementation check for
and handle this case specially.
The Makefile ordering expected by lint-makefiles for the new functions
is a bit peculiar, but I implemented it in this patch so that the test
passes; I don't know why log2 also needed moving in one Makefile
variable setting when it didn't in my previous patches, but the
failure showed a different place was expected for that function as
well.
The powerpc64le IFUNC setup seems not to be as self-contained as one
might hope; it shouldn't be necessary to add IFUNCs for new functions
such as these simply to get them building, but without setting up
IFUNCs for the new functions, there were undefined references to
__GI___expm1f128 (that IFUNC machinery results in no such function
being defined, but doesn't stop include/math.h from doing the
redirection resulting in the exp2m1f128 and exp10m1f128
implementations expecting to call it).
Tested for x86_64 and x86, and with build-many-glibcs.py.
|
|
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the logp1 functions (aliases for log1p functions - the
name is intended to be more consistent with the new log2p1 and
log10p1, where clearly it would have been very confusing to name those
functions log21p and log101p). As aliases rather than new functions,
the content of this patch is somewhat different from those actually
adding new functions.
Tests are shared with log1p, so this patch *does* mechanically update
all affected libm-test-ulps files to expect the same errors for both
functions.
The vector versions of log1p on aarch64 and x86_64 are *not* updated
to have logp1 aliases (and thus there are no corresponding header,
tests, abilist or ulps changes for vector functions either). It would
be reasonable for such vector aliases and corresponding changes to
other files to be made separately. For now, the log1p tests instead
avoid testing logp1 in the vector case (a Makefile change is needed to
avoid problems with grep, used in generating the .c files for vector
function tests, matching more than one ALL_RM_TEST line in a file
testing multiple functions with the same inputs, when it assumes that
the .inc file only has a single such line).
Tested for x86_64 and x86, and with build-many-glibcs.py.
|
|
For powerpc64 the generic version provides a weak definition of
strchrnul, which are already provided by the ifunc resolver. The
powerpc32 version is slight different, where for static case there
is no iFUNC support.
The strncasecmp_l is provided ifunc resolver.
Checked on powerpc-linux-gnu-power4 and powerpc64-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
#31629]
This patch ensures that $libc_cv_cc_submachine, which is set from
"--with-cpu", overrides $CFLAGS for configure time tests.
Suggested-by: Peter Bergner <bergner@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
|
|
This patch is based on __strcmp_power10.
Improvements from __strncmp_power9:
1. Uses new POWER10 instructions
- This code uses lxvp to decrease contention on load
by loading 32 bytes per instruction.
2. Performance implication
- This version has around 38% better performance on average.
- Minor performance regression is seen for few small sizes
and specific combination of alignments.
Signed-off-by: Amrita H S <amritahs@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
|
|
These structs describe file formats under /var/log, and should not
depend on the definition of _TIME_BITS. This is achieved by
defining __WORDSIZE_TIME64_COMPAT32 to 1 on 32-bit ports that
support 32-bit time_t values (where __time_t is 32 bits).
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
This seems to have stopped working with some GCC 14 versions,
which clobber r2. With other compilers, the kernel-provided
r2 value is still available at this point.
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
|
|
related changes.
The following three changes have been added to provide initial Power11 support.
1. Add the directories to hold Power11 files.
2. Add support to select Power11 libraries based on AT_PLATFORM.
3. Let submachine=power11 be set automatically.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
|
|
Similar to strstr (1e9a550ba4), power8 strcasestr does not show much
improvement compared to the generic implementation. The geomean
on bench-strcasestr shows:
__strcasestr_power8 __strcasestr_ppc
power10 1159 1120
power9 1640 1469
power8 1787 1904
The strcasestr uses the same 'trick' as power7 strstr to detect
potential quadradic behavior, which only adds overheads for input
that trigger quadradic behavior and it is really a hack.
Checked on powerpc64le-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
|
|
The optimization is not faster than the generic algorithm,
using the bench-strstr the geometric mean running on a POWER10 machine
using gcc 13.1.1 is 482.47 while the default __strstr_ppc is 340.97
(which uses the generic implementation).
Also, there is no need to redirect the internal str*/mem* call
to optimized version, internal ifunc is supported and enabled
for internal calls (meaning that the generic implementation
will use any asm optimization if available).
Checked on powerpc64le-linux-gnu.
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
|
|
Complete the internal renaming from "C2X" and related names in GCC by
renaming *-c2x and *-gnu2x tests to *-c23 and *-gnu23.
Tested for x86_64, and with build-many-glibcs.py for powerpc64le.
|
|
It allows to remove a lot of arch-specific implementations.
Checked on x86_64, aarch64, powerpc64.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
|
|
|
Current implementation of strcmp for power10 has
performance regression for multiple small sizes
and alignment combination.
Most of these performance issues are fixed by this
patch. The compare loop is unrolled and page crosses
of unrolled loop is handled.
Thanks to Paul E. Murphy for helping in fixing the
performance issues.
Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
Co-Authored-By: Paul E. Murphy <murphyp@linux.ibm.com>
Reviewed-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
|
|
Optimized memchr for POWER10 based on existing rawmemchr and strlen.
Reordering instructions and loop unrolling helped in getting better performance.
Reviewed-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
|
|
This patch is based on __strcmp_power9 and __strlen_power10.
Improvements from __strcmp_power9:
1. Uses new POWER10 instructions
- This code uses lxvp to decrease contention on load
by loading 32 bytes per instruction.
2. Performance implication
- This version has around 30% better performance on average.
- Performance regression is seen for a specific combination
of sizes and alignments. Some of them is observed without
changes also, while rest may be induced by the patch.
Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
|
|
The _dl_non_dynamic_init does not parse LD_PROFILE, which does not
enable profile for dlopen objects. Since dlopen is deprecated for
static objects, it is better to remove the support.
It also allows to trim down libc.a of profile support.
Checked on x86_64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
|
|
This patch enables the option to influence hwcaps used by PowerPC.
The environment variable, GLIBC_TUNABLES=glibc.cpu.hwcaps=-xxx,yyy,-zzz....,
can be used to enable CPU/ARCH feature yyy, disable CPU/ARCH feature xxx
and zzz, where the feature name is case-sensitive and has to match the ones
mentioned in the file{sysdeps/powerpc/dl-procinfo.c}.
Note that the hwcap tunables only used in the IFUNC selection.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
The compiler might not see that internal definition is an alias
due the libc_ifunc macro, which redefines __strchrnul. With
gcc 6 it fails with:
In file included from <command-line>:0:0:
./../include/libc-symbols.h:472:33: error: ‘__EI___strchrnul’ aliased to
undefined symbol ‘__GI___strchrnul’
extern thread __typeof (name) __EI_##name \
^
./../include/libc-symbols.h:468:3: note: in expansion of macro
‘__hidden_ver2’
__hidden_ver2 (, local, internal, name)
^~~~~~~~~~~~~
./../include/libc-symbols.h:476:29: note: in expansion of macro
‘__hidden_ver1’
# define hidden_def(name) __hidden_ver1(__GI_##name, name, name);
^~~~~~~~~~~~~
./../include/libc-symbols.h:557:32: note: in expansion of macro
‘hidden_def’
# define libc_hidden_def(name) hidden_def (name)
^~~~~~~~~~
../sysdeps/powerpc/powerpc64/multiarch/strchrnul.c:38:1: note: in
expansion of macro ‘libc_hidden_def’
libc_hidden_def (__strchrnul)
^~~~~~~~~~~~~~~
Use libc_ifunc_hidden as stpcpy. Checked on powerpc64 with
gcc 6 and gcc 13.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
|
Bump autoconf requirement to 2.71 to allow regenerating configure on
more recent distributions. autoconf 2.71 has been in Fedora since F36
and is the current version in Debian stable (bookworm). It appears to
be current in Gentoo as well.
All sysdeps configure and preconfigure scripts have also been
regenerated; all changes are trivial transformations that do not affect
functionality.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|