aboutsummaryrefslogtreecommitdiff
path: root/sysdeps/aarch64/multiarch/init-arch.h
AgeCommit message (Collapse)AuthorFilesLines
2025-01-01Update copyright dates with scripts/update-copyrightsPaul Eggert1-1/+1
2024-03-21AArch64: Check kernel version for SVE ifuncsWilco Dijkstra1-0/+2
Old Linux kernels disable SVE after every system call. Calling the SVE-optimized memcpy afterwards will then cause a trap to reenable SVE. As a result, applications with a high use of syscalls may run slower with the SVE memcpy. This is true for kernels between 4.15.0 and before 6.2.0, except for 5.14.0 which was patched. Avoid this by checking the kernel version and selecting the SVE ifunc on modern kernels. Parse the kernel version reported by uname() into a 24-bit kernel.major.minor value without calling any library functions. If uname() is not supported or if the version format is not recognized, assume the kernel is modern. Tested-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-01-01Update copyright dates with scripts/update-copyrightsPaul Eggert1-1/+1
2023-10-24AArch64: Add support for MOPS memcpy/memmove/memsetWilco Dijkstra1-1/+3
Add support for MOPS in cpu_features and INIT_ARCH. Add ifuncs using MOPS for memcpy, memmove and memset (use .inst for now so it works with all binutils versions without needing complex configure and conditional compilation). Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2023-01-06Update copyright dates with scripts/update-copyrightsJoseph Myers1-1/+1
2022-01-01Update copyright dates with scripts/update-copyrightsPaul Eggert1-1/+1
I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 7061 files FOO. I then removed trailing white space from math/tgmath.h, support/tst-support-open-dev-null-range.c, and sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following obscure pre-commit check failure diagnostics from Savannah. I don't know why I run into these diagnostics whereas others evidently do not. remote: *** 912-#endif remote: *** 913: remote: *** 914- remote: *** error: lines with trailing whitespace found ... remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
2021-05-27aarch64: Added optimized memcpy and memmove for A64FXNaohiro Tamura1-1/+3
This patch optimizes the performance of memcpy/memmove for A64FX [1] which implements ARMv8-A SVE and has L1 64KB cache per core and L2 8MB cache per NUMA node. The performance optimization makes use of Scalable Vector Register with several techniques such as loop unrolling, memory access alignment, cache zero fill, and software pipelining. SVE assembler code for memcpy/memmove is implemented as Vector Length Agnostic code so theoretically it can be run on any SOC which supports ARMv8-A SVE standard. We confirmed that all testcases have been passed by running 'make check' and 'make xcheck' not only on A64FX but also on ThunderX2. And also we confirmed that the SVE 512 bit vector register performance is roughly 4 times better than Advanced SIMD 128 bit register and 8 times better than scalar 64 bit register by running 'make bench'. [1] https://github.com/fujitsu/A64FX Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> Reviewed-by: Szabolcs Nagy <Szabolcs.Nagy@arm.com>
2021-01-25aarch64: Fix the list of tested IFUNC variants [BZ #26818]Szabolcs Nagy1-0/+2
Some IFUNC variants are not compatible with BTI and MTE so don't set them as usable for testing and benchmarking on a BTI or MTE enabled system. As far as IFUNC selectors are concerned a system is BTI enabled if the cpu supports it and glibc was built with BTI branch protection. Most IFUNC variants are BTI compatible, but thunderx2 memcpy and memmove use a jump table with indirect jump, without a BTI j. Fixes bug 26818.
2021-01-25aarch64: Move and update the definition of MTE_ENABLEDSzabolcs Nagy1-1/+10
The hwcap value is now in linux 5.10 and in glibc bits/hwcap.h, so use that definition. Move the definition to init-arch.h so all ifunc selectors can use it and expose an "mte" shorthand for mte enabled runtime. For now we allow user code to enable tag checks and use PROT_MTE mappings without libc involvment, this is not guaranteed ABI, but can be useful for testing and debugging with MTE.
2021-01-02Update copyright dates with scripts/update-copyrightsPaul Eggert1-1/+1
I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 6694 files FOO. I then removed trailing white space from benchtests/bench-pthread-locks.c and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this diagnostic from Savannah: remote: *** pre-commit check failed ... remote: *** error: lines with trailing whitespace found remote: error: hook declined to update refs/heads/master
2020-01-01Update copyright dates with scripts/update-copyrights.Joseph Myers1-1/+1
2019-09-07Prefer https to http for gnu.org and fsf.org URLsPaul Eggert1-1/+1
Also, change sources.redhat.com to sourceware.org. This patch was automatically generated by running the following shell script, which uses GNU sed, and which avoids modifying files imported from upstream: sed -ri ' s,(http|ftp)(://(.*\.)?(gnu|fsf|sourceware)\.org($|[^.]|\.[^a-z])),https\2,g s,(http|ftp)(://(.*\.)?)sources\.redhat\.com($|[^.]|\.[^a-z]),https\2sourceware.org\4,g ' \ $(find $(git ls-files) -prune -type f \ ! -name '*.po' \ ! -name 'ChangeLog*' \ ! -path COPYING ! -path COPYING.LIB \ ! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \ ! -path manual/texinfo.tex ! -path scripts/config.guess \ ! -path scripts/config.sub ! -path scripts/install-sh \ ! -path scripts/mkinstalldirs ! -path scripts/move-if-change \ ! -path INSTALL ! -path locale/programs/charmap-kw.h \ ! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \ ! '(' -name configure \ -execdir test -f configure.ac -o -f configure.in ';' ')' \ ! '(' -name preconfigure \ -execdir test -f preconfigure.ac ';' ')' \ -print) and then by running 'make dist-prepare' to regenerate files built from the altered files, and then executing the following to cleanup: chmod a+x sysdeps/unix/sysv/linux/riscv/configure # Omit irrelevant whitespace and comment-only changes, # perhaps from a slightly-different Autoconf version. git checkout -f \ sysdeps/csky/configure \ sysdeps/hppa/configure \ sysdeps/riscv/configure \ sysdeps/unix/sysv/linux/csky/configure # Omit changes that caused a pre-commit check to fail like this: # remote: *** error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines git checkout -f \ sysdeps/powerpc/powerpc64/ppc-mcount.S \ sysdeps/unix/sysv/linux/s390/s390-64/syscall.S # Omit change that caused a pre-commit check to fail like this: # remote: *** error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S
2019-01-01Update copyright dates with scripts/update-copyrights.Joseph Myers1-1/+1
* All files with FSF copyright notices: Update copyright dates using scripts/update-copyrights. * locale/programs/charmap-kw.h: Regenerated. * locale/programs/locfile-kw.h: Likewise.
2018-01-01Update copyright dates with scripts/update-copyrights.Joseph Myers1-1/+1
* All files with FSF copyright notices: Update copyright dates using scripts/update-copyrights. * locale/programs/charmap-kw.h: Regenerated. * locale/programs/locfile-kw.h: Likewise.
2017-11-20aarch64: Optimized memset for falkorSiddhesh Poyarekar1-3/+5
The generic memset reads dczid_el0 on every memset. This has a significant impact on falkor for a range of sizes because reading dczid_el0 is slow. The DZP bit in the dczid_el0 register does not change dynamically, so it is safe to read once during program startup. With this patch dczid_el0 is read once during startup and zva_size is cached. This is used to invoke the falkor-specific memset; the generic memset routine remains unchanged. The gains due to this are significant for falkor, with run time reductions as high as 48%. Here's a sample from the falkor tests: Function: memset Variant: walk simple_memset __memset_falkor __memset_generic ===================================================================== length=256, char=0: 139.96 (-698.28%) 9.07 ( 48.26%) 17.53 length=257, char=0: 140.50 (-699.03%) 9.53 ( 45.80%) 17.58 length=258, char=0: 140.96 (-703.95%) 9.58 ( 45.36%) 17.53 length=259, char=0: 141.56 (-705.16%) 9.53 ( 45.79%) 17.58 length=260, char=0: 142.15 (-710.76%) 9.57 ( 45.39%) 17.53 length=261, char=0: 142.50 (-710.39%) 9.53 ( 45.78%) 17.58 length=262, char=0: 142.97 (-715.09%) 9.57 ( 45.42%) 17.54 length=263, char=0: 143.51 (-716.18%) 9.53 ( 45.80%) 17.58 length=264, char=0: 143.93 (-720.55%) 9.58 ( 45.39%) 17.54 length=265, char=0: 144.56 (-722.07%) 9.53 ( 45.80%) 17.59 length=266, char=0: 144.98 (-726.42%) 9.58 ( 45.42%) 17.54 length=267, char=0: 145.53 (-727.53%) 9.53 ( 45.80%) 17.59 length=268, char=0: 146.25 (-731.81%) 9.53 ( 45.79%) 17.58 length=269, char=0: 146.52 (-735.39%) 9.53 ( 45.66%) 17.54 length=270, char=0: 146.97 (-735.81%) 9.53 ( 45.80%) 17.58 length=271, char=0: 147.54 (-741.08%) 9.58 ( 45.38%) 17.54 length=512, char=0: 268.26 (-1307.85%) 12.06 ( 36.71%) 19.05 length=513, char=0: 268.73 (-1273.89%) 13.56 ( 30.68%) 19.56 length=514, char=0: 269.31 (-1276.89%) 13.56 ( 30.68%) 19.56 length=515, char=0: 269.73 (-1279.05%) 13.56 ( 30.68%) 19.56 length=516, char=0: 270.34 (-1282.24%) 13.56 ( 30.67%) 19.56 length=517, char=0: 270.83 (-1284.71%) 13.56 ( 30.66%) 19.56 length=518, char=0: 271.20 (-1286.54%) 13.56 ( 30.67%) 19.56 length=519, char=0: 271.67 (-1288.67%) 13.65 ( 30.24%) 19.56 length=520, char=0: 272.14 (-1291.04%) 13.65 ( 30.22%) 19.56 length=521, char=0: 272.66 (-1293.69%) 13.65 ( 30.23%) 19.56 length=522, char=0: 273.14 (-1296.13%) 13.65 ( 30.20%) 19.56 length=523, char=0: 273.64 (-1298.75%) 13.65 ( 30.23%) 19.56 length=524, char=0: 274.34 (-1302.16%) 13.66 ( 30.20%) 19.57 length=525, char=0: 274.64 (-1297.78%) 13.56 ( 30.99%) 19.65 length=526, char=0: 275.20 (-1300.04%) 13.56 ( 31.01%) 19.66 length=527, char=0: 275.66 (-1302.86%) 13.56 ( 30.99%) 19.65 length=1024, char=0: 524.46 (-2169.75%) 20.12 ( 12.92%) 23.11 length=1025, char=0: 525.14 (-2124.63%) 21.62 ( 8.40%) 23.61 length=1026, char=0: 525.59 (-2125.36%) 21.88 ( 7.37%) 23.62 length=1027, char=0: 525.98 (-2127.14%) 21.62 ( 8.46%) 23.62 length=1028, char=0: 526.68 (-2131.10%) 21.62 ( 8.42%) 23.61 length=1029, char=0: 527.10 (-2131.70%) 21.79 ( 7.73%) 23.62 length=1030, char=0: 527.54 (-2118.51%) 21.62 ( 9.10%) 23.78 length=1031, char=0: 527.98 (-2136.37%) 21.62 ( 8.43%) 23.61 length=1032, char=0: 528.70 (-2139.38%) 21.62 ( 8.43%) 23.61 length=1033, char=0: 529.25 (-2124.37%) 21.62 ( 9.11%) 23.79 length=1034, char=0: 529.48 (-2142.95%) 21.62 ( 8.43%) 23.61 length=1035, char=0: 530.11 (-2145.13%) 21.62 ( 8.44%) 23.61 length=1036, char=0: 530.76 (-2147.10%) 21.79 ( 7.73%) 23.62 length=1037, char=0: 531.03 (-2149.45%) 21.62 ( 8.42%) 23.61 length=1038, char=0: 531.64 (-2151.87%) 21.62 ( 8.42%) 23.61 length=1039, char=0: 531.99 (-2151.63%) 21.80 ( 7.75%) 23.63 * sysdeps/aarch64/memset-reg.h: New file. * sysdeps/aarch64/memset.S: Use it. (__memset): Rename to MEMSET macro. [ZVA_MACRO]: Use zva_macro. * sysdeps/aarch64/multiarch/Makefile (sysdep_routines): Add memset_generic and memset_falkor. * sysdeps/aarch64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add memset ifuncs. * sysdeps/aarch64/multiarch/init-arch.h (INIT_ARCH): New local variable zva_size. * sysdeps/aarch64/multiarch/memset.c: New file. * sysdeps/aarch64/multiarch/memset_generic.S: New file. * sysdeps/aarch64/multiarch/memset_falkor.S: New file. * sysdeps/aarch64/multiarch/rtld-memset.S: New file. * sysdeps/unix/sysv/linux/aarch64/cpu-features.c (DCZID_DZP_MASK): New macro. (DCZID_BS_MASK): Likewise. (init_cpu_features): Read and set zva_size. * sysdeps/unix/sysv/linux/aarch64/cpu-features.h (struct cpu_features): New member zva_size.
2017-05-24aarch64: Thunderx specific memcpy and memmoveSteve Ellcey1-0/+23
* sysdeps/aarch64/memcpy.S (MEMMOVE, MEMCPY): New macros. (memmove): Use MEMMOVE for name. (memcpy): Use MEMCPY for name. Change internal labels to external labels. * sysdeps/aarch64/multiarch/Makefile: New file. * sysdeps/aarch64/multiarch/ifunc-impl-list.c: Likewise. * sysdeps/aarch64/multiarch/init-arch.h: Likewise. * sysdeps/aarch64/multiarch/memcpy.c: Likewise. * sysdeps/aarch64/multiarch/memcpy_generic.S: Likewise. * sysdeps/aarch64/multiarch/memcpy_thunderx.S: Likewise. * sysdeps/aarch64/multiarch/memmove.c: Likewise.