aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
11 daysnptl: Use <support/check.h> facilities in tst-setuid3release/2.34/masterMaciej W. Rozycki1-21/+16
Remove local FAIL macro in favor to FAIL_EXIT1 from <support/check.h>, which provides equivalent reporting, with the name of the file and the line number within of the failure site additionally included. Remove FAIL_ERR altogether and include ": %m" explicitly with the format string supplied to FAIL_EXIT1 as there seems little value to have a separate macro just for this. Reviewed-by: DJ Delorie <dj@redhat.com> (cherry picked from commit 8c98195af6e6f1ce21743fc26c723e0f7e45bcf2)
11 daysposix: Use <support/check.h> facilities in tst-truncate and tst-truncate64Maciej W. Rozycki1-13/+12
Remove local FAIL macro in favor to FAIL_RET from <support/check.h>, which provides equivalent reporting, with the name of the file of the failure site additionally included, for the tst-truncate-common core shared between the tst-truncate and tst-truncate64 tests. Reviewed-by: DJ Delorie <dj@redhat.com> (cherry picked from commit fe47595504a55e7bb992f8928533df154b510383)
11 daysungetc: Fix backup buffer leak on program exit [BZ #27821]Siddhesh Poyarekar4-2/+47
If a file descriptor is left unclosed and is cleaned up by _IO_cleanup on exit, its backup buffer remains unfreed, registering as a leak in valgrind. This is not strictly an issue since (1) the program should ideally be closing the stream once it's not in use and (2) the program is about to exit anyway, so keeping the backup buffer around a wee bit longer isn't a real problem. Free it anyway to keep valgrind happy when the streams in question are the standard ones, i.e. stdout, stdin or stderr. Also, the _IO_have_backup macro checks for _IO_save_base, which is a roundabout way to check for a backup buffer instead of directly looking for _IO_backup_base. The roundabout check breaks when the main get area has not been used and user pushes a char into the backup buffer with ungetc. Fix this to use the _IO_backup_base directly. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit 3e1d8d1d1dca24ae90df2ea826a8916896fc7e77) (cherry picked from commit b9f72bd5de931eac39219018c2fa319a449bb2cf)
11 daysungetc: Fix uninitialized read when putting into unused streams [BZ #27821]Siddhesh Poyarekar3-6/+6
When ungetc is called on an unused stream, the backup buffer is allocated without the main get area being present. This results in every subsequent ungetc (as the stream remains in the backup area) checking uninitialized memory in the backup buffer when trying to put a character back into the stream. Avoid comparing the input character with buffer contents when in backup to avoid this uninitialized read. The uninitialized read is harmless in this context since the location is promptly overwritten with the input character, thus fulfilling ungetc functionality. Also adjust wording in the manual to drop the paragraph that says glibc cannot do multiple ungetc back to back since with this change, ungetc can actually do this. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit cdf0f88f97b0aaceb894cc02b21159d148d7065c) (cherry picked from commit 804d3c8db79db204154dcf5e11a76f14fdddc570)
11 daysMake tst-ungetc use libsupportSiddhesh Poyarekar1-55/+57
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit 3f7df7e757f4efec38e45d4068e5492efcac4856) (cherry picked from commit 87a1968a72e4b4e5436f3e2be1ed8a8d5a5862c7)
11 daysstdio-common: Add test for vfscanf with matches longer than INT_MAX [BZ #27650]Maciej W. Rozycki2-0/+113
Complement commit b03e4d7bd25b ("stdio: fix vfscanf with matches longer than INT_MAX (bug 27650)") and add a test case for the issue, inspired by the reproducer provided with the bug report. This has been verified to succeed as from the commit referred and fail beforehand. As the test requires 2GiB of data to be passed around its performance has been evaluated using a choice of systems and the execution time determined to be respectively in the range of 9s for POWER9@2.166GHz, 24s for FU740@1.2GHz, and 40s for 74Kf@950MHz. As this is on the verge of and beyond the default timeout it has been increased by the factor of 8. Regardless, following recent practice the test has been added to the standard rather than extended set. Reviewed-by: DJ Delorie <dj@redhat.com> (cherry picked from commit 89cddc8a7096f3d9225868304d2bc0a1aaf07d63) (cherry picked from commit 99ffa84bdcdc3d81e82f448279f0c8278dd30964)
11 dayssupport: Add FAIL test failure helperMaciej W. Rozycki3-41/+17
Add a FAIL test failure helper analogous to FAIL_RET, that does not cause the current function to return, providing a standardized way to report a test failure with a message supplied while permitting the caller to continue executing, for further reporting, cleaning up, etc. Update existing test cases that provide a conflicting definition of FAIL by removing the local FAIL definition and then as follows: - tst-fortify-syslog: provide a meaningful message in addition to the file name already added by <support/check.h>; 'support_record_failure' is already called by 'support_print_failure_impl' invoked by the new FAIL test failure helper. - tst-ctype: no update to FAIL calls required, with the name of the file and the line number within of the failure site additionally included by the new FAIL test failure helper, and error counting plus count reporting upon test program termination also already provided by 'support_record_failure' and 'support_report_failure' respectively, called by 'support_print_failure_impl' and 'adjust_exit_status' also respectively. However in a number of places 'printf' is called and the error count adjusted by hand, so update these places to make use of FAIL instead. And last but not least adjust the final summary just to report completion, with any error count following as reported by the test driver. - test-tgmath2: no update to FAIL calls required, with the name of the file of the failure site additionally included by the new FAIL test failure helper. Also there is no need to track the return status by hand as any call to FAIL will eventually cause the test case to return an unsuccesful exit status regardless of the return status from the test function, via a call to 'adjust_exit_status' made by the test driver. Reviewed-by: DJ Delorie <dj@redhat.com> (cherry picked from commit 1b97a9f23bf605ca608162089c94187573fb2a9e) (cherry picked from commit 28f358bc4209ab0425170cdccf65bb1fe861148f)
11 daysstdio-common: Reformat Makefile.Carlos O'Donell1-68/+227
Reflow Makefile. Sort using scripts/sort-makefile-lines.py. Code generation is changed as routines are linked in sorted order as expected. No regressions on x86_64 and i686. (cherry picked from commit c3004417afc98585089a9282d1d4d60cdef5317a) (cherry picked from commit 5b4e90230b8147e273585bf296bf1a9fb6e2b4c2)
2024-08-06Fix name space violation in fortify wrappers (bug 32052)Andreas Schwab5-64/+64
Rename the identifier sz to __sz everywhere. Fixes: a643f60c53 ("Make sure that the fortified function conditionals are constant") (cherry picked from commit 39ca997ab378990d5ac1aadbaa52aaf1db6d526f) (redone from scratch because of many conflicts)
2024-08-01resolv: Fix tst-resolv-short-response for older GCC (bug 32042)Florian Weimer1-2/+4
Previous GCC versions do not support the C23 change that allows labels on declarations. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit ec119972cb2598c04ec7d4219e20506006836f64)
2024-07-24resolv: Track single-request fallback via _res._flags (bug 31476)Florian Weimer3-5/+10
This avoids changing _res.options, which inteferes with change detection as part of automatic reloading of /etc/resolv.conf. Reviewed-by: DJ Delorie <dj@redhat.com> (cherry picked from commit 868ab8923a2ec977faafec97ecafac0c3159c1b2)
2024-07-24resolv: Do not wait for non-existing second DNS response after error (bug 30081)Florian Weimer5-1/+150
In single-request mode, there is no second response after an error because the second query has not been sent yet. Waiting for it introduces an unnecessary timeout. Reviewed-by: DJ Delorie <dj@redhat.com> (cherry picked from commit af625987d619388a100b153520d3ee308bda9889)
2024-07-24resolv: Allow short error responses to match any query (bug 31890)Florian Weimer4-10/+135
Reviewed-by: DJ Delorie <dj@redhat.com> (cherry picked from commit 691a3b2e9bfaba842e46a5ccb7f5e6ea144c3ade)
2024-07-16s390x: Fix segfault in wcsncmp [BZ #31934]Stefan Liebler1-9/+1
The z13/vector-optimized wcsncmp implementation segfaults if n=1 and there is only one character (equal on both strings) before the page end. Then it loads and compares one character and misses to check n again. The following load fails. This patch removes the extra load and compare of the first character and just start with the loop which uses vector-load-to-block-boundary. This code-path also checks n. With this patch both tests are passing: - the simplified one mentioned in the bugzilla 31934 - the full one in Florian Weimer's patch: "manual: Document a GNU extension for strncmp/wcsncmp" (https://patchwork.sourceware.org/project/glibc/patch/874j9eml6y.fsf@oldenburg.str.redhat.com/): On s390x-linux-gnu (z16), the new wcsncmp test fails due to bug 31934. Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit 9b7651410375ec8848a1944992d663d514db4ba7)
2024-05-10Force DT_RPATH for --enable-hardcoded-path-in-testsH.J. Lu1-2/+5
On Fedora 40/x86-64, linker enables --enable-new-dtags by default which generates DT_RUNPATH instead of DT_RPATH. Unlike DT_RPATH, DT_RUNPATH only applies to DT_NEEDED entries in the executable and doesn't applies to DT_NEEDED entries in shared libraries which are loaded via DT_NEEDED entries in the executable. Some glibc tests have libstdc++.so.6 in DT_NEEDED, which has libm.so.6 in DT_NEEDED. When DT_RUNPATH is generated, /lib64/libm.so.6 is loaded for such tests. If the newly built glibc is older than glibc 2.36, these tests fail with assert/tst-assert-c++: /export/build/gnu/tools-build/glibc-gitlab-release/build-x86_64-linux/libc.so.6: version `GLIBC_2.36' not found (required by /lib64/libm.so.6) assert/tst-assert-c++: /export/build/gnu/tools-build/glibc-gitlab-release/build-x86_64-linux/libc.so.6: version `GLIBC_ABI_DT_RELR' not found (required by /lib64/libm.so.6) Pass -Wl,--disable-new-dtags to linker when building glibc tests with --enable-hardcoded-path-in-tests. This fixes BZ #31719. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 2dcaf70643710e22f92a351e36e3cff8b48c60dc)
2024-05-02elf: Disable some subtests of ifuncmain1, ifuncmain5 for !PIEFlorian Weimer2-0/+22
(cherry picked from commit 9cc9d61ee12f2f8620d8e0ea3c42af02bf07fe1e)
2024-05-02nscd: Use time_t for return type of addgetnetgrentXFlorian Weimer1-2/+2
Using int may give false results for future dates (timeouts after the year 2028). Fixes commit 04a21e050d64a1193a6daab872bca2528bda44b ("CVE-2024-33601, CVE-2024-33602: nscd: netgroup: Use two buffers in addgetnetgrentX (bug 31680)"). Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit 4bbca1a44691a6e9adcee5c6798a707b626bc331)
2024-05-02login: structs utmp, utmpx, lastlog _TIME_BITS independence (bug 30701)Florian Weimer18-22/+165
These structs describe file formats under /var/log, and should not depend on the definition of _TIME_BITS. This is achieved by defining __WORDSIZE_TIME64_COMPAT32 to 1 on 32-bit ports that support 32-bit time_t values (where __time_t is 32 bits). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 9abdae94c7454c45e02e97e4ed1eb1b1915d13d8)
2024-05-02login: Check default sizes of structs utmp, utmpx, lastlogFlorian Weimer16-1/+85
The default <utmp-size.h> is for ports with a 64-bit time_t. Ports with a 32-bit time_t or with __WORDSIZE_TIME64_COMPAT32=1 need to override it. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 4d4da5aab936504b2d3eca3146e109630d9093c4)
2024-05-02sparc: Remove 64 bit check on sparc32 wordsize (BZ 27574)Adhemerval Zanella1-9/+4
The sparc32 is always 32 bits. Checked on sparcv9-linux-gnu. (cherry picked from commit dd57f5e7b652772499cb220d78157c1038d24f06)
2024-05-02malloc: Exit early on test failure in tst-reallocFlorian Weimer1-31/+15
This addresses more (correct) use-after-free warnings reported by GCC 12 on some targets. Fixes commit c094c232eb3246154265bb035182f92fe1b17ab8 ("Avoid -Wuse-after-free in tests [BZ #26779]."). Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit d653fd2d9ebe23c2b16b76edf717c5dbd5ce9b77)
2024-04-25CVE-2024-33601, CVE-2024-33602: nscd: netgroup: Use two buffers in ↵Florian Weimer1-98/+121
addgetnetgrentX (bug 31680) This avoids potential memory corruption when the underlying NSS callback function does not use the buffer space to store all strings (e.g., for constant strings). Instead of custom buffer management, two scratch buffers are used. This increases stack usage somewhat. Scratch buffer allocation failure is handled by return -1 (an invalid timeout value) instead of terminating the process. This fixes bug 31679. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit c04a21e050d64a1193a6daab872bca2528bda44b)
2024-04-25CVE-2024-33600: nscd: Avoid null pointer crashes after notfound response ↵Florian Weimer1-4/+7
(bug 31678) The addgetnetgrentX call in addinnetgrX may have failed to produce a result, so the result variable in addinnetgrX can be NULL. Use db->negtimeout as the fallback value if there is no result data; the timeout is also overwritten below. Also avoid sending a second not-found response. (The client disconnects after receiving the first response, so the data stream did not go out of sync even without this fix.) It is still beneficial to add the negative response to the mapping, so that the client can get it from there in the future, instead of going through the socket. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit b048a482f088e53144d26a61c390bed0210f49f2)
2024-04-25CVE-2024-33600: nscd: Do not send missing not-found response in ↵Florian Weimer1-8/+6
addgetnetgrentX (bug 31678) If we failed to add a not-found response to the cache, the dataset point can be null, resulting in a null pointer dereference. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit 7835b00dbce53c3c87bbbb1754a95fb5e58187aa)
2024-04-25CVE-2024-33599: nscd: Stack-based buffer overflow in netgroup cache (bug 31677)Florian Weimer1-2/+3
Using alloca matches what other caches do. The request length is bounded by MAXKEYLEN. Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit 87801a8fd06db1d654eea3e4f7626ff476a9bdaa)
2024-04-17iconv: ISO-2022-CN-EXT: fix out-of-bound writes when writing escape sequence ↵Charles Fol3-1/+144
(CVE-2024-2961) ISO-2022-CN-EXT uses escape sequences to indicate character set changes (as specified by RFC 1922). While the SOdesignation has the expected bounds checks, neither SS2designation nor SS3designation have its; allowing a write overflow of 1, 2, or 3 bytes with fixed values: '$+I', '$+J', '$+K', '$+L', '$+M', or '$*H'. Checked on aarch64-linux-gnu. Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit f9dc609e06b1136bb0408be9605ce7973a767ada)
2024-04-14powerpc: Fix ld.so address determination for PCREL mode (bug 31640)Florian Weimer1-0/+19
This seems to have stopped working with some GCC 14 versions, which clobber r2. With other compilers, the kernel-provided r2 value is still available at this point. Reviewed-by: Peter Bergner <bergner@linux.ibm.com> (cherry picked from commit 14e56bd4ce15ac2d1cc43f762eb2e6b83fec1afe)
2024-04-09AArch64: Check kernel version for SVE ifuncsWilco Dijkstra5-3/+54
Old Linux kernels disable SVE after every system call. Calling the SVE-optimized memcpy afterwards will then cause a trap to reenable SVE. As a result, applications with a high use of syscalls may run slower with the SVE memcpy. This is true for kernels between 4.15.0 and before 6.2.0, except for 5.14.0 which was patched. Avoid this by checking the kernel version and selecting the SVE ifunc on modern kernels. Parse the kernel version reported by uname() into a 24-bit kernel.major.minor value without calling any library functions. If uname() is not supported or if the version format is not recognized, assume the kernel is modern. Tested-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 2e94e2f5d2bf2de124c8ad7da85463355e54ccb2)
2024-04-09aarch64: fix check for SVE support in assemblerSzabolcs Nagy2-4/+6
Due to GCC bug 110901 -mcpu can override -march setting when compiling asm code and thus a compiler targetting a specific cpu can fail the configure check even when binutils gas supports SVE. The workaround is that explicit .arch directive overrides both -mcpu and -march, and since that's what the actual SVE memcpy uses the configure check should use that too even if the GCC issue is fixed independently. Reviewed-by: Florian Weimer <fweimer@redhat.com> (cherry picked from commit 73c26018ed0ecd9c807bb363cc2c2ab4aca66a82)
2024-04-09aarch64: correct CFI in rawmemchr (bug 31113)Andreas Schwab1-1/+1
The .cfi_return_column directive changes the return column for the whole FDE range. But the actual intent is to tell the unwinder that the value in x30 (lr) now resides in x15 after the move, and that is expressed by the .cfi_register directive. (cherry picked from commit 3f798427884fa57770e8e2291cf58d5918254bb5)
2024-04-09AArch64: Remove Falkor memcpyWilco Dijkstra8-332/+1
The latest implementations of memcpy are actually faster than the Falkor implementations [1], so remove the falkor/phecda ifuncs for memcpy and the now unused IS_FALKOR/IS_PHECDA defines. [1] https://sourceware.org/pipermail/libc-alpha/2022-December/144227.html Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 2f5524cc5381eb75fef55f7901bb907bd5628333)
2024-04-09AArch64: Add memset_zva64Wilco Dijkstra6-68/+38
Add a specialized memset for the common ZVA size of 64 to avoid the overhead of reading the ZVA size. Since the code is identical to __memset_falkor, remove the latter. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 3d7090f14b13312320e425b27dcf0fe72de026fd)
2024-04-09AArch64: Cleanup emag memsetWilco Dijkstra4-197/+90
Cleanup emag memset - merge the memset_base64.S file, remove the unused ZVA code (since it is disabled on emag). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 9627ab99b50d250c6dd3001a3355aa03692f7fe5)
2024-04-09AArch64: Cleanup ifuncsWilco Dijkstra18-125/+41
Cleanup ifuncs. Remove uses of libc_hidden_builtin_def, use ENTRY rather than ENTRY_ALIGN, remove unnecessary defines and conditional compilation. Rename strlen_mte to strlen_generic. Remove rtld-memset. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 9fd3409842b3e2d31cff5dbd6f96066c430f0aa2)
2024-04-09AArch64: Add support for MOPS memcpy/memmove/memsetWilco Dijkstra11-1/+141
Add support for MOPS in cpu_features and INIT_ARCH. Add ifuncs using MOPS for memcpy, memmove and memset (use .inst for now so it works with all binutils versions without needing complex configure and conditional compilation). Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 2bd00179885928fd95fcabfafc50e7b5c6e660d2)
2024-04-09Add HWCAP2_MOPS from Linux 6.5 to AArch64 bits/hwcap.hJoseph Myers1-0/+22
Linux 6.5 adds a new AArch64 HWCAP2 value, HWCAP2_MOPS. Add it to glibc's bits/hwcap.h. Tested with build-many-glibcs.py for aarch64-linux-gnu. (cherry picked from commit ff5d2abd18629e0efac41e31699cdff3be0e08fa)
2024-04-09AArch64: Improve SVE memcpy and memmoveWilco Dijkstra1-20/+14
Improve SVE memcpy by copying 2 vectors if the size is small enough. This improves performance of random memcpy by ~9% on Neoverse V1, and 33-64 byte copies are ~16% faster. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit d2d3f3720ce627a4fe154d8dd14db716a32bcc6e)
2024-04-09AArch64: Improve strrchrWilco Dijkstra1-25/+33
Use shrn for narrowing the mask which simplifies code and speeds up small strings. Unroll the first search loop to improve performance on large strings. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 55599d480437dcf129b41b95be32b48f2a9e5da9)
2024-04-09AArch64: Optimize strnlenWilco Dijkstra1-21/+18
Optimize strnlen using the shrn instruction and improve the main loop. Small strings are around 10% faster, large strings are 40% faster on modern CPUs. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit ad098893ba3c3344a5f2f6ab1627c47204afdb47)
2024-04-09AArch64: Optimize strlenWilco Dijkstra1-8/+12
Optimize strlen by unrolling the main loop. Large strings are 64% faster on modern CPUs. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 03c8ce5000198947a4dd7b2c14e5131738fda62b)
2024-04-09AArch64: Optimize strcpyWilco Dijkstra1-17/+19
Unroll the main loop. Large strings are around 20% faster on modern CPUs. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 349e48c01e85bd96006860084e76d322e6ca02f1)
2024-04-09AArch64: Improve strchrnulWilco Dijkstra1-2/+10
Unroll the main loop, which improves performance slightly. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 09ebd8549b2ce5a3a6c0c7c5f3e62227faf50a99)
2024-04-09AArch64: Optimize strchrWilco Dijkstra1-28/+24
Simplify calculation of the mask using shrn. Unroll the main loop. Small strings are 20% faster on modern CPUs. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 51541a229740801882490177fa178e49264b13fb)
2024-04-09AArch64: Improve strlen_asimdWilco Dijkstra1-12/+4
Use shrn for the mask, merge tst+bne into cbnz, and tweak code alignment. Performance improves slightly as a result. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 1bbb1a2022e126f21810d3d0ebe0a975d5243e43)
2024-04-09AArch64: Optimize memrchrWilco Dijkstra1-9/+11
Optimize the main loop - large strings are 43% faster on modern CPUs. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 00776241776e67fc666b896c1e85770f4f3ec1e1)
2024-04-09AArch64: Optimize memchrWilco Dijkstra1-13/+14
Optimize the main loop - large strings are 40% faster on modern CPUs. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit ce758d4f063820c2bc743e12797d7454c66be718)
2024-04-09aarch64: Use memcpy_simd as the default memcpyWilco Dijkstra6-370/+81
Since __memcpy_simd is the fastest memcpy on almost all cores, replace the generic memcpy with it. If SVE is available, a SVE memcpy will be used by default (including for Neoverse N2). (cherry picked from commit e6f3fe362f1aab78b1448d69ecdbd9e3872636d3)
2024-04-09aarch64: Cleanup memset ifuncWilco Dijkstra2-17/+26
Cleanup memset ifunc selectors. The A64FX memset relies on a ZVA size of 256, so add an explicit check. (cherry picked from commit a8e72913fea0c6e2832c50523c60907ffa3b753b)
2024-04-09AArch64: Fix typo in sve configure check (BZ# 29394)Wilco Dijkstra2-4/+4
Fix a typo in the SVE configure check. This fixes [BZ# 29394]. (cherry picked from commit 12182ba18dabda791a4f63a11ee2e9d828f40f9b)
2024-04-09aarch64: Optimize string functions with shrn instructionDanila Kutenin6-102/+59
We found that string functions were using AND+ADDP to find the nibble/syndrome mask but there is an easier opportunity through `SHRN dst.8b, src.8h, 4` (shift right every 2 bytes by 4 and narrow to 1 byte) and has same latency on all SIMD ARMv8 targets as ADDP. There are also possible gaps for memcmp but that's for another patch. We see 10-20% savings for small-mid size cases (<=128) which are primary cases for general workloads. (cherry picked from commit 3c9980698988ef64072f1fac339b180f52792faf)