aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-04-12localedata: LC_MEASUREMENT: use copy directives everywhereMike Frysinger306-613/+615
There are only two measurement systems that locales use: US and metric. For the former, move to copying the en_US locale, while for the latter, move to copying the i18n locale. This lets us clean up all the stray comments like FIXME. There should be no functional differences here.
2016-04-12localedata: CLDRv29: update LC_IDENTIFICATION language/territory fieldsMike Frysinger96-132/+230
This updates all the territory fields based on CLDR v29 data. Many of them were obviously incorrect where people used a two letter code and not the English name. aa_DJ: changing DJ to Djibouti aa_ER@saaho: changing ER to Eritrea aa_ER: changing ER to Eritrea aa_ET: changing ET to Ethiopia am_ET: changing ET to Ethiopia ar_LY: changing Libyan Arab Jamahiriya to Libya ar_SY: changing Syrian Arab Republic to Syria bo_CN: changing P.R. of China to China bs_BA: changing Bosnia and Herzegowina to Bosnia & Herzegovina byn_ER: changing ER to Eritrea ca_IT: changing Italy (L'Alguer) to Italy ce_RU: changing RUSSIAN FEDERATION to Russia cmn_TW: changing Republic of China to Taiwan cy_GB: changing Great Britain to United Kingdom de_LU@euro: changing Luxemburg to Luxembourg de_LU: changing Luxemburg to Luxembourg en_AG: changing Antigua and Barbuda to Antigua & Barbuda en_GB: changing Great Britain to United Kingdom en_HK: changing Hong Kong to Hong Kong SAR China en_US: changing USA to United States es_US: changing USA to United States fr_LU@euro: changing Luxemburg to Luxembourg fr_LU: changing Luxemburg to Luxembourg fy_DE: changing DE to Germany gd_GB: changing Great Britain to United Kingdom gez_ER@abegede: changing ER to Eritrea gez_ER: changing ER to Eritrea gez_ET@abegede: changing ET to Ethiopia gez_ET: changing ET to Ethiopia gv_GB: changing Britain to United Kingdom hak_TW: changing Republic of China to Taiwan iu_CA: changing CA to Canada ko_KR: changing Republic of Korea to South Korea kw_GB: changing Britain to United Kingdom li_BE: changing BE to Belgium li_NL: changing NL to Netherlands lzh_TW: changing Republic of China to Taiwan my_MM: changing Myanmar to Myanmar (Burma) nan_TW: changing Republic of China to Taiwan nds_DE: changing DE to Germany nds_NL: changing NL to Netherlands om_ET: changing ET to Ethiopia om_KE: changing KE to Kenya pap_AW: changing AW to Aruba pap_CW: changing CW to Curaçao pt_BR: changing Brasil to Brazil sid_ET: changing ET to Ethiopia sk_SK: changing Slovak to Slovakia so_DJ: changing DJ to Djibouti so_ET: changing ET to Ethiopia so_KE: changing KE to Kenya so_SO: changing SO to Somalia ti_ER: changing ER to Eritrea ti_ET: changing ET to Ethiopia tig_ER: changing ER to Eritrea tt_RU@iqtelif: changing Tatarstan, Russian Federation to Russia uk_UA: changing UA to Ukraine unm_US: changing USA to United States wal_ET: changing ET to Ethiopia yi_US: changing USA to United States yue_HK: changing Hong Kong to Hong Kong SAR China zh_CN: changing P.R. of China to China zh_HK: changing Hong Kong to Hong Kong SAR China zh_TW: changing Taiwan R.O.C. to Taiwan This updates all the language fields based on CLDR v29 data. Many of them were obviously incorrect where people used a two letter code and not the English name. aa_DJ: changing aa to Afar aa_ER: changing aa to Afar aa_ER@saaho: changing aa to Afar aa_ET: changing aa to Afar am_ET: changing am to Amharic az_AZ: changing Azeri to Azerbaijani bn_BD: changing Bengali/Bangla to Bengali byn_ER: changing byn to Blin de_AT: changing German to Austrian German de_CH: changing German to Swiss High German en_AU: changing English to Australian English en_CA: changing English to Canadian English en_GB: changing English to British English en_US: changing English to American English es_ES: changing Spanish to European Spanish es_MX: changing Spanish to Mexican Spanish ff_SN: changing ff to Fulah fr_CA: changing French to Canadian French fr_CH: changing French to Swiss French fur_IT: changing Furlan to Friulian fy_DE: changing fy to Western Frisian fy_NL: changing Frisian to Western Frisian gd_GB: changing Scots Gaelic to Scottish Gaelic gez_ER@abegede: changing gez to Geez gez_ER: changing gez to Geez gez_ET@abegede: changing gez to Geez gez_ET: changing gez to Geez gv_GB: changing Manx Gaelic to Manx ht_HT: changing Kreyol to Haitian Creole kl_GL: changing Greenlandic to Kalaallisut lg_UG: changing Luganda to Ganda li_BE: changing li to Limburgish li_NL: changing li to Limburgish nan_TW@latin: changing Minnan to Min Nan Chinese nb_NO: changing Norwegian, Bokmål to Norwegian Bokmål nds_DE: changing nds to Low German nds_NL: changing nds to Low Saxon niu_NU: changing Vagahau Niue (Niuean) to Niuean niu_NZ: changing Vagahau Niue (Niuean) to Niuean nl_BE: changing Dutch to Flemish nn_NO: changing Norwegian, Nynorsk to Norwegian Nynorsk nr_ZA: changing Southern Ndebele to South Ndebele om_ET: changing om to Oromo om_KE: changing om to Oromo or_IN: changing Odia to Oriya os_RU: changing Ossetian to Ossetic pap_AW: changing pap to Papiamento pap_CW: changing pap to Papiamento pa_PK: changing Punjabi (Shahmukhi) to Punjabi pt_BR: changing Portuguese to Brazilian Portuguese pt_PT: changing Portuguese to European Portuguese se_NO: changing Northern Saami to Northern Sami sid_ET: changing sid to Sidamo so_DJ: changing so to Somali so_ET: changing so to Somali so_KE: changing so to Somali so_SO: changing so to Somali st_ZA: changing Sotho to Southern Sotho sw_KE: changing sw to Swahili sw_TZ: changing sw to Swahili ti_ER: changing ti to Tigrinya ti_ET: changing ti to Tigrinya tig_ER: changing tig to Tigre uk_UA: changing uk to Ukrainian wal_ET: changing wal to Wolaytta yue_HK: changing Yue Chinese to Cantonese
2016-04-12localedata: LC_TIME.date_fmt: delete entries same as the default valueMike Frysinger140-592/+142
There's no real value in populating this field when it's the same as the default POSIX setting, so drop it from most locales so it's clear what's going on.
2016-04-12X86-64: Use non-temporal store in memcpy on large dataH.J. Lu6-171/+260
The large memcpy micro benchmark in glibc shows that there is a regression with large data on Haswell machine. non-temporal store in memcpy on large data can improve performance significantly. This patch adds a threshold to use non temporal store which is 6 times of shared cache size. When size is above the threshold, non temporal store will be used, but avoid non-temporal store if there is overlap between destination and source since destination may be in cache when source is loaded. For size below 8 vector register width, we load all data into registers and store them together. Only forward and backward loops, which move 4 vector registers at a time, are used to support overlapping addresses. For forward loop, we load the last 4 vector register width of data and the first vector register width of data into vector registers before the loop and store them after the loop. For backward loop, we load the first 4 vector register width of data and the last vector register width of data into vector registers before the loop and store them after the loop. [BZ #19928] * sysdeps/x86_64/cacheinfo.c (__x86_shared_non_temporal_threshold): New. (init_cacheinfo): Set __x86_shared_non_temporal_threshold to 6 times of shared cache size. * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S (VMOVNT): New. * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S (VMOVNT): Likewise. * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S (VMOVNT): Likewise. (VMOVU): Changed to movups for smaller code sizes. (VMOVA): Changed to movaps for smaller code sizes. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: Update comments. (PREFETCH): New. (PREFETCH_SIZE): Likewise. (PREFETCHED_LOAD_SIZE): Likewise. (PREFETCH_ONE_SET): Likewise. Rewrite to use forward and backward loops, which move 4 vector registers at a time, to support overlapping addresses and use non temporal store if size is above the threshold and there is no overlap between destination and source.
2016-04-12VDSO support for MIPSMatthew Fortune8-0/+149
This patch adds support for using the implementations of gettimeofday() and clock_gettime() provided by the kernel in the VDSO. The VDSO will always provide clock_gettime() as CLOCK_{REALTIME,MONOTONIC}_COARSE can be implemented regardless of platform. CLOCK_{REALTIME,MONOTONIC}, along with gettimeofday(), are only implemented on platforms which make use of either the CP0 count or GIC as their clocksource. On other platforms, the VDSO does not provide the __vdso_gettimeofday symbol, as it is never useful. The VDSO functions return ENOSYS when they encounter an unsupported request, in which case glibc should fall back to the standard syscall. Tested with upstream kernel 4.5 and QEMU emulating Malta. ./vdsotest gettimeofday bench gettimeofday: syscall: 1021 nsec/call gettimeofday: libc: 262 nsec/call gettimeofday: vdso: 174 nsec/call * sysdeps/unix/sysv/linux/mips/Makefile (sysdep_routines): Include dl-vdso. * sysdeps/unix/sysv/linux/mips/Versions: Add __vdso_clock_gettime. * sysdeps/unix/sysv/linux/mips/init-first.c: New file. * sysdeps/unix/sysv/linux/mips/libc-vdso.h: New file. * sysdeps/unix/sysv/linux/mips/mips32/sysdep.h: (INTERNAL_VSYSCALL_CALL): Define to be compatible with MIPS definitions of INTERNAL_SYSCALL_{ERROR_P,ERRNO}. (HAVE_CLOCK_GETTIME_VSYSCALL): Define. (HAVE_GETTIMEOFDAY_VSYSCALL): Define. * sysdeps/unix/sysv/linux/mips/mips64/n32/sysdep.h: Likewise. * sysdeps/unix/sysv/linux/mips/mips64/n64/sysdep.h: Likewise.
2016-04-11Consolidate pwrite/pwrite64 implementationsAdhemerval Zanella17-427/+52
This patch consolidates all the pwrite/pwrite64 implementation for Linux in only one (sysdeps/unix/sysv/linux/pwrite{64}.c). It also removes the syscall from the auto-generation using assembly macros. For pwrite{64} offset argument placement the new SYSCALL_LL{64} macro is used. For pwrite ports that do not define __NR_pwrite will use __NR_pwrite64 and for pwrite64 ports that dot define __NR_pwrite64 will use __NR_pwrite for the syscall. Checked on x86_64, x32, i386, aarch64, and ppc64le. * sysdeps/unix/sysv/linux/arm/pwrite.c: Remove file. * sysdeps/unix/sysv/linux/arm/pwrite64.c: Likewise. * sysdeps/unix/sysv/linux/generic/wordsize-32/pwrite.c: Likewise. * sysdeps/unix/sysv/linux/generic/wordsize-32/pwrite64.c: Likewise. * sysdeps/unix/sysv/linux/powerpc/powerpc32/pwrite.c: Likewise. * sysdeps/unix/sysv/linux/powerpc/powerpc32/pwrite64.c: Likewise. * sysdeps/unix/sysv/linux/wordsize-64/pwrite64.c: Likewise. * sysdeps/unix/sysv/linux/wordsize-64/syscalls.list (prite): Remove syscalls generation. * sysdeps/unix/sysv/linux/powerpc/powerpc32/sysdep.h [__NR_pwrite64] (__NR_write): Remove define. * sysdeps/unix/sysv/linux/powerpc/powerpc64/sysdep.h [__NR_pwrite64] (__NR_write): Remove define. * sysdeps/unix/sysv/linux/pwrite.c [__NR_pwrite64] (__NR_pwrite): Remove define. (__libc_pwrite): Use SYSCALL_LL macro on offset argument. * sysdeps/unix/sysv/linux/pwrite64.c [__NR_pwrite64] (__NR_pwrite): Remove define. (__libc_pwrite64): Use SYSCALL_LL64 macro on offset argument. * sysdeps/unix/sysv/linux/sh/pwrite.c: Rewrite using default Linux implementation as base. * sysdeps/unix/sysv/linux/sh/pwrite64.c: Likewise. * sysdeps/unix/sysv/linux/mips/pwrite.c: Likewise. * sysdeps/unix/sysv/linux/mips/pwrite64.c: Likewise.
2016-04-11Consolidate pread/pread64 implementationsAdhemerval Zanella18-414/+53
This patch consolidates all the pread/pread64 implementation for Linux in only one (sysdeps/unix/sysv/linux/pread.c). It also removes the syscall from the auto-generation using assembly macros. For pread{64} offset argument placement the new SYSCALL_LL{64} macro is used. For pread ports that do not define __NR_pread will use __NR_pread64 and for pread64 ports that dot define __NR_pread64 will use __NR_pread for the syscall. Checked on x86_64, x32, i386, aarch64, and ppc64le. * sysdeps/unix/sysv/linux/arm/pread.c: Remove file. * sysdeps/unix/sysv/linux/arm/pread64.c: Likewise. * sysdeps/unix/sysv/linux/generic/wordsize-32/pread.c: Likewise. * sysdeps/unix/sysv/linux/generic/wordsize-32/pread64.c: Likewise. * sysdeps/unix/sysv/linux/powerpc/powerpc32/pread.c: Likewise, * sysdeps/unix/sysv/linux/powerpc/powerpc32/pread64.c: Likewise. * sysdeps/unix/sysv/linux/wordsize-64/pread64.c: Likewise. * sysdeps/unix/sysv/linux/wordsize-64/syscalls.list (pread): Remove syscall generation. * sysdeps/unix/sysv/linux/powerpc/powerpc32/sysdep.h [__NR_pread64] (__NR_pread): Remove define. * sysdeps/unix/sysv/linux/powerpc/powerpc64/sysdep.h: [__NR_pread64] (__NR_pread): Likewise. * sysdeps/unix/sysv/linux/pread.c [__NR_pread64] (__NR_pread): Remove define. (__libc_pread): Use SYSCALL_LL macro on offset argument. * sysdeps/unix/sysv/linux/pread64.c [__NR_pread64] (__NR_pread): Remove define. (__libc_pread64): Use SYSCALL_LL64 macro on offset argument. * sysdeps/unix/sysv/linux/sh/pread.c: Rewrite using default Linux implementation as base. * sysdeps/unix/sysv/linux/sh/pread64.c: Likewise. * sysdeps/unix/sysv/linux/mips/pread.c: Likewise. * sysdeps/unix/sysv/linux/mips/pread64.c: Likewise.
2016-04-11Consolidate off_t/off64_t syscall argument passingAdhemerval Zanella5-10/+47
This patch add three new macros (SYSCALL_LL, SYSCALL_LL64, and __ASSUME_WORDSIZE64_ILP32) to use along with off_t and off64_t argument syscalls. The rationale for this change is: 1. Remove multiple implementations for the same syscall for different architectures (for instance, pread have 6 different implementations). 2. Also remove the requirement to use syscall wrappers for cancellable entrypoints. The macro usage should be used along __ALIGNMENT_ARG to follow ABI constrains for architecture where it applies. For instance, pread can be rewritten as: return SYSCALL_CANCEL (pread, fd, buf, count, __ALIGNMENT_ARG SYSCALL_LL (offset)); Another macro, SYSCALL_LL64, is provided for off64_t. The macro __ASSUME_WORDSIZE64_ILP32 is used by the ABI to define is uses 64-bit register even if ABI is ILP32 (for instance x32 and mips64-n32). The changes itself are not currently used in any implementation, so no code change is expected. * sysdeps/unix/sysv/linux/generic/sysdep.h (__ALIGNMENT_ARG): Move definition. (__ALIGNMENT_COUNT): Likewise. * sysdeps/unix/sysv/linux/sysdep.h (__ALIGNMENT_ARG): To here. (__ALIGNMENT_COUNT): Likewise. (SYSCALL_LL): New define. (SYSCALL_LL64): Likewise. * sysdeps/unix/sysv/linux/mips/kernel-features.h: [_MIPS_SIM == _ABIO32] (__ASSUME_WORDSIZE64_ILP32): Define. * sysdeps/unix/sysv/linux/x86_64/kernel-features.h: [ILP32] (__ASUME_WORDSIZE64_ILP32): Likewise.
2016-04-11Define __ASSUME_ALIGNED_REGISTER_PAIRS for missing portsAdhemerval Zanella4-0/+25
This patch defines __ASSUME_ALIGNED_REGISTER_PAIRS for the missing ports that require 64-bit value (e.g., long long) to be aligned to an even register pair in argument passing. No code change is expected, tested with builds for powerpc32, mips-o32, and armhf. * sysdeps/unix/sysv/linux/arm/kernel-features.h (__ASSUME_ALIGNED_REGISTER_PAIRS): Define. * sysdeps/unix/sysv/linux/mips/kernel-features.h [_MIPS_SIM == _ABIO32] (__ASSUME_ALIGNED_REGISTER_PAIRS): Likewise. * sysdeps/unix/sysv/linux/powerpc/kernel-features.h [!__powerpc64__] (__ASSUME_ALIGNED_REGISTER_PAIRS): Likewise.
2016-04-11nss_dns: Fix assertion failure in _nss_dns_getcanonname_r [BZ #19865]Florian Weimer2-0/+13
2016-04-11Add missing bug number to ChangeLogFlorian Weimer1-0/+1
2016-04-11Fix build with HAVE_AUX_VECTORSamuel Thibault3-6/+9
* sysdeps/unix/sysv/linux/ldsodefs.h (HAVE_AUX_VECTOR): Define before including <ldsodefs.h>. * sysdeps/nacl/ldsodefs.h (HAVE_AUX_VECTOR): Likewise.
2016-04-10Fix crash on getauxval call without HAVE_AUX_VECTORSamuel Thibault3-0/+11
* sysdeps/generic/ldsodefs.h (struct rtld_global_ro) [!HAVE_AUX_VECTOR]: Do not define _dl_auxv field. * misc/getauxval.c (__getauxval) [!HAVE_AUX_VECTOR]: Do not go through GLRO(dl_auxv) list.
2016-04-09Allow overriding of CFLAGS as well as CPPFLAGS for rtld.Nick Alcock3-2/+8
We need this to pass -fno-stack-protector to all the pieces of rtld in non-elf/ directories.
2016-04-09When disabling SSE, make sure -fpmath is not set to use SSE eitherKhem Raj2-1/+8
This fixes errors when we inject sse options through CFLAGS and now that we have -Werror turned on by default this warning turns into an error on x86: $ gcc -m32 -march=core2 -mtune=core2 -msse3 -mfpmath=sse -x c /dev/null -S -mno-sse -mno-mmx /dev/null:1:0: warning: SSE instruction set disabled, using 387 arithmetics Where as: $ gcc -m32 -march=core2 -mtune=core2 -msse3 -mfpmath=sse -x c /dev/null -S -mno-sse -mno-mmx -mfpmath=387 Generates no warnings.
2016-04-09localedata: CLDRv28: update LC_PAPER valuesMike Frysinger3-2/+7
These locales should be using A4 paper size rather than US-Letter. Update the copy points to match the others in the file. All other locales have been verified against the CLDR and hand checking.
2016-04-09configure: fix `test ==` usageMike Frysinger5-6/+13
POSIX defines the = operator, but not ==. Fix the few places where we incorrectly used ==.
2016-04-08localedata: iw_IL: delete old/deprecated locale [BZ #16137]Mike Frysinger8-174/+18
From the bug: Obsolete locale. The ISO-639 code for Hebrew was changed from 'iw' to 'he' in 1989, according to Bruno Haible on libc-alpha 2003-09-01. Reported-by: Chris Leonard <cjlhomeaddress@gmail.com>
2016-04-08Fix limits.h NL_NMAX namespace (bug 19929).Joseph Myers3-2/+9
bits/xopen_lim.h (included by limits.h if __USE_XOPEN) defines NL_NMAX, but this constant was removed in the 2008 edition of POSIX so should not be defined in that case. This patch duly disables that define for __USE_XOPEN2K8. It remains enabled for __USE_GNU to avoid affecting sysconf (_SC_NL_NMAX), the implementation of which uses "#ifdef NL_NMAX". Tested for x86_64 and x86 (testsuite, and that installed stripped shared libraries are unchanged by the patch). [BZ #19929] * include/bits/xopen_lim.h (NL_NMAX): Do not define if [__USE_XOPEN2K8 && !__USE_GNU]. * conform/Makefile (test-xfail-XOPEN2K8/limits.h/conform): Remove variable.
2016-04-08localedata: i18n: fix typos in tel_int_fmtMike FABIAN2-1/+5
Adding the %t avoids a double space if the area code %a happens to be empty. There are countries without area codes.
2016-04-08Fix termios.h XCASE namespace (bug 19925).Joseph Myers7-7/+19
bits/termios.h (various versions under sysdeps/unix/sysv/linux) defines XCASE if defined __USE_MISC || defined __USE_XOPEN. This macro was removed in the 2001 edition of POSIX, and is not otherwise reserved, so should not be defined for 2001 and later versions of POSIX. This patch fixes the conditions accordingly (leaving the macro defined for __USE_MISC, so still in the default namespace). Tested for x86_64 and x86 (testsuite, and that installed shared libraries are unchanged by the patch). [BZ #19925] * sysdeps/unix/sysv/linux/alpha/bits/termios.h (XCASE): Do not define if [!__USE_MISC && __USE_XOPEN2K]. * sysdeps/unix/sysv/linux/bits/termios.h (XCASE): Likewise. * sysdeps/unix/sysv/linux/mips/bits/termios.h (XCASE): Likewise. * sysdeps/unix/sysv/linux/powerpc/bits/termios.h (XCASE): Likewise. * sysdeps/unix/sysv/linux/sparc/bits/termios.h (XCASE): Likewise. * conform/Makefile (test-xfail-XOPEN2K/termios.h/conform): Remove variable. (test-xfail-XOPEN2K8/termios.h/conform): Likewise.
2016-04-07powerpc: Add optimized P8 strspnPaul E. Murphy7-1/+304
This utilizes vectors and bitmasks. For small needle, large haystack, the performance improvement is upto 8x. For short strings (0-4B), the cost of computing the bitmask dominates, and is a tad slower.
2016-04-07hsearch_r: Include <limits.h>Florian Weimer2-0/+5
It is needed for UINT_MAX.
2016-04-07scratch_buffer_set_array_size: Include <limits.h>Florian Weimer2-0/+5
It is needed for CHAR_BIT.
2016-04-06X86-64: Prepare memmove-vec-unaligned-erms.SH.J. Lu2-54/+95
Prepare memmove-vec-unaligned-erms.S to make the SSE2 version as the default memcpy, mempcpy and memmove. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S (MEMCPY_SYMBOL): New. (MEMPCPY_SYMBOL): Likewise. (MEMMOVE_CHK_SYMBOL): Likewise. Replace MEMMOVE_SYMBOL with MEMMOVE_CHK_SYMBOL on __mempcpy_chk symbols. Replace MEMMOVE_SYMBOL with MEMPCPY_SYMBOL on __mempcpy symbols. Provide alias for __memcpy_chk in libc.a. Provide alias for memcpy in libc.a and ld.so.
2016-04-06X86-64: Prepare memset-vec-unaligned-erms.SH.J. Lu2-13/+28
Prepare memset-vec-unaligned-erms.S to make the SSE2 version as the default memset. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S (MEMSET_CHK_SYMBOL): New. Define if not defined. (__bzero): Check VEC_SIZE == 16 instead of USE_MULTIARCH. Disabled fro now. Replace MEMSET_SYMBOL with MEMSET_CHK_SYMBOL on __memset_chk symbols. Properly check USE_MULTIARCH on __memset symbols.
2016-04-06Add memcpy/memmove/memset benchmarks with large dataH.J. Lu6-2/+393
Add memcpy, memmove and memset benchmarks with large data sizes. * benchtests/Makefile (string-benchset): Add memcpy-large, memmove-large and memset-large. * benchtests/bench-memcpy-large.c: New file. * benchtests/bench-memmove-large.c: Likewise. * benchtests/bench-memmove-large.c: Likewise. * benchtests/bench-string.h (TIMEOUT): Don't redefine.
2016-04-06Mention Bug in ChangeLog for S390: Save and restore fprs/vrs while resolving ↵Stefan Liebler1-0/+1
symbols. The Bugzilla 19916 is added to the ChangeLog for commit 4603c51ef7989d7eb800cdd6f42aab206f891077.
2016-04-05Force 32-bit displacement in memset-vec-unaligned-erms.SH.J. Lu2-0/+18
* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Force 32-bit displacement to avoid long nop between instructions.
2016-04-05Add a comment in memset-sse2-unaligned-erms.SH.J. Lu2-0/+7
* sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Add a comment on VMOVU and VMOVA.
2016-04-04strfmon_l: Use specified locale for number formatting [BZ #19633]Florian Weimer7-25/+301
2016-04-03Don't put SSE2/AVX/AVX512 memmove/memset in ld.soH.J. Lu7-32/+51
Since memmove and memset in ld.so don't use IFUNC, don't put SSE2, AVX and AVX512 memmove and memset in ld.so. * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: Skip if not in libc. * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S: Likewise.
2016-04-03Fix memmove-vec-unaligned-erms.SH.J. Lu2-24/+39
__mempcpy_erms and __memmove_erms can't be placed between __memmove_chk and __memmove it breaks __memmove_chk. Don't check source == destination first since it is less common. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: (__mempcpy_erms, __memmove_erms): Moved before __mempcpy_chk with unaligned_erms. (__memmove_erms): Skip if source == destination. (__memmove_unaligned_erms): Don't check source == destination first.
2016-04-01Remove Fast_Copy_Backward from Intel Core processorsH.J. Lu2-5/+6
Intel Core i3, i5 and i7 processors have fast unaligned copy and copy backward is ignored. Remove Fast_Copy_Backward from Intel Core processors to avoid confusion. * sysdeps/x86/cpu-features.c (init_cpu_features): Don't set bit_arch_Fast_Copy_Backward for Intel Core proessors.
2016-04-01Use PTR_ALIGN_DOWN on strcspn and strspnAdhemerval Zanella3-2/+10
Tested on aarch64. * string/strcspn.c (strcspn): Use PTR_ALIGN_DOWN. * string/strspn.c (strspn): Likewise.
2016-04-01Test 64-byte alignment in memset benchtestH.J. Lu2-1/+12
Add 64-byte alignment tests in memset benchtest for 64-byte vector registers. * benchtests/bench-memset.c (do_test): Support 64-byte alignment. (test_main): Test 64-byte alignment.
2016-04-01Test 64-byte alignment in memmove benchtestH.J. Lu2-0/+13
Add 64-byte alignment tests in memmove benchtest for 64-byte vector registers. * benchtests/bench-memmove.c (test_main): Test 64-byte alignment.
2016-04-01Test 64-byte alignment in memcpy benchtestH.J. Lu2-0/+12
Add 64-byte alignment tests in memcpy benchtest for 64-byte vector registers. * benchtests/bench-memcpy.c (test_main): Test 64-byte alignment.
2016-04-01Remove powerpc64 strspn, strcspn, and strpbrk implementationAdhemerval Zanella4-406/+4
This patch removes the powerpc64 optimized strspn, strcspn, and strpbrk assembly implementation now that the default C one implements the same strategy. On internal glibc benchtests current implementations shows similar performance with -O2. Tested on powerpc64le (POWER8). * sysdeps/powerpc/powerpc64/strcspn.S: Remove file. * sysdeps/powerpc/powerpc64/strpbrk.S: Remove file. * sysdeps/powerpc/powerpc64/strspn.S: Remove file.
2016-04-01Improve generic strpbrk performanceAdhemerval Zanella4-68/+36
With now a faster strcspn implementation, it is faster to just use it with some return tests than reimplementing strpbrk itself. As for strcspn optimization, it is generally at least 10 times faster than the existing implementation on bench-strspn on a few AArch64 implementations. Also the string/bits/string2.h inlines make no longer sense, as current implementation will already implement most of the optimizations. Tested on x86_64, i386, and aarch64. * string/strpbrk.c (strpbrk): Rewrite function. * string/bits/string2.h (strpbrk): Use __builtin_strpbrk. (__strpbrk_c2): Likewise. (__strpbrk_c3): Likewise. * string/string-inlines.c [SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strpbrk_c2): Likewise. [SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strpbrk_c3): Likewise.
2016-04-01Improve generic strspn performanceAdhemerval Zanella4-86/+97
As for strcspn, this patch improves strspn performance using a much faster algorithm. It first constructs a 256-entry table based on the accept string and then uses it as a lookup table for the input string. As for strcspn optimization, it is generally at least 10 times faster than the existing implementation on bench-strspn on a few AArch64 implementations. Also the string/bits/string2.h inlines make no longer sense, as current implementation will already implement most of the optimizations. Tested on x86_64, i686, and aarch64. * string/strspn.c (strcspn): Rewrite function. * string/bits/string2.h (strspn): Use __builtin_strcspn. (__strspn_c1): Remove inline function. (__strspn_c2): Likewise. (__strspn_c3): Likewise. * string/string-inlines.c [SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strspn_c1): Add compatibility symbol. [SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strspn_c2): Likewise. [SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strspn_c3): Likewise.
2016-04-01Improve generic strcspn performanceWilco Dijkstra6-96/+102
Improve strcspn performance using a much faster algorithm. It is kept simple so it works well on most targets. It is generally at least 10 times faster than the existing implementation on bench-strcspn on a few AArch64 implementations, and for some tests 100 times as fast (repeatedly calling strchr on a small string is extremely slow...). In fact the string/bits/string2.h inlines make no longer sense, as GCC already uses strlen if reject is an empty string, strchrnul is 5 times as fast as __strcspn_c1, while __strcspn_c2 and __strcspn_c3 are slower than the strcspn main loop for large strings (though reject length 2-4 could be special cased in the future to gain even more performance). Tested on x86_64, i686, and aarch64. * string/Version (libc): Add GLIBC_2.24. * string/strcspn.c (strcspn): Rewrite function. * string/bits/string2.h (strcspn): Use __builtin_strcspn. (__strcspn_c1): Remove inline function. (__strcspn_c2): Likewise. (__strcspn_c3): Likewise. * string/string-inline.c [SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strcspn_c1): Add compatibility symbol. [SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strcspn_c2): Likewise. [SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strcspn_c3): Likewise. * sysdeps/i386/string-inlines.c: Include generic string-inlines.c.
2016-04-01S390: Use ahi instead of aghi in 32bit _dl_runtime_resolve.Stefan Liebler2-1/+6
This patch uses ahi instead of aghi in 32bit _dl_runtime_resolve to adjust the stack pointer. This is no functional change, but a cosmetic one. ChangeLog: * sysdeps/s390/s390-32/dl-trampoline.h (_dl_runtime_resolve): Use ahi instead of aghi to adjust stack pointer.
2016-03-31Increase internal precision of ldbl-128ibm decimal printf [BZ #19853]Paul E. Murphy3-11/+42
When the signs differ, the precision of the conversion sometimes drops below 106 bits. This strategy is identical to the hexadecimal variant. I've refactored tst-sprintf3 to enable testing a value with more than 30 significant digits in order to demonstrate this failure and its solution. Additionally, this implicitly fixes a typo in the shift quantities when subtracting from the high mantissa to compute the difference.
2016-03-31Add x86-64 memset with unaligned store and rep stosbH.J. Lu7-1/+358
Implement x86-64 memset with unaligned store and rep movsb. Support 16-byte, 32-byte and 64-byte vector register sizes. A single file provides 2 implementations of memset, one with rep stosb and the other without rep stosb. They share the same codes when size is between 2 times of vector register size and REP_STOSB_THRESHOLD which defaults to 2KB. Key features: 1. Use overlapping store to avoid branch. 2. For size <= 4 times of vector register size, fully unroll the loop. 3. For size > 4 times of vector register size, store 4 times of vector register size at a time. [BZ #19881] * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memset-sse2-unaligned-erms, memset-avx2-unaligned-erms and memset-avx512-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Test __memset_chk_sse2_unaligned, __memset_chk_sse2_unaligned_erms, __memset_chk_avx2_unaligned, __memset_chk_avx2_unaligned_erms, __memset_chk_avx512_unaligned, __memset_chk_avx512_unaligned_erms, __memset_sse2_unaligned, __memset_sse2_unaligned_erms, __memset_erms, __memset_avx2_unaligned, __memset_avx2_unaligned_erms, __memset_avx512_unaligned_erms and __memset_avx512_unaligned. * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S: New file. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Likewise.
2016-03-31Add x86-64 memmove with unaligned load/store and rep movsbH.J. Lu7-1/+634
Implement x86-64 memmove with unaligned load/store and rep movsb. Support 16-byte, 32-byte and 64-byte vector register sizes. When size <= 8 times of vector register size, there is no check for address overlap bewteen source and destination. Since overhead for overlap check is small when size > 8 times of vector register size, memcpy is an alias of memmove. A single file provides 2 implementations of memmove, one with rep movsb and the other without rep movsb. They share the same codes when size is between 2 times of vector register size and REP_MOVSB_THRESHOLD which is 2KB for 16-byte vector register size and scaled up by large vector register size. Key features: 1. Use overlapping load and store to avoid branch. 2. For size <= 8 times of vector register size, load all sources into registers and store them together. 3. If there is no address overlap bewteen source and destination, copy from both ends with 4 times of vector register size at a time. 4. If address of destination > address of source, backward copy 8 times of vector register size at a time. 5. Otherwise, forward copy 8 times of vector register size at a time. 6. Use rep movsb only for forward copy. Avoid slow backward rep movsb by fallbacking to backward copy 8 times of vector register size at a time. 7. Skip when address of destination == address of source. [BZ #19776] * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memmove-sse2-unaligned-erms, memmove-avx-unaligned-erms and memmove-avx512-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Test __memmove_chk_avx512_unaligned_2, __memmove_chk_avx512_unaligned_erms, __memmove_chk_avx_unaligned_2, __memmove_chk_avx_unaligned_erms, __memmove_chk_sse2_unaligned_2, __memmove_chk_sse2_unaligned_erms, __memmove_avx_unaligned_2, __memmove_avx_unaligned_erms, __memmove_avx512_unaligned_2, __memmove_avx512_unaligned_erms, __memmove_erms, __memmove_sse2_unaligned_2, __memmove_sse2_unaligned_erms, __memcpy_chk_avx512_unaligned_2, __memcpy_chk_avx512_unaligned_erms, __memcpy_chk_avx_unaligned_2, __memcpy_chk_avx_unaligned_erms, __memcpy_chk_sse2_unaligned_2, __memcpy_chk_sse2_unaligned_erms, __memcpy_avx_unaligned_2, __memcpy_avx_unaligned_erms, __memcpy_avx512_unaligned_2, __memcpy_avx512_unaligned_erms, __memcpy_sse2_unaligned_2, __memcpy_sse2_unaligned_erms, __memcpy_erms, __mempcpy_chk_avx512_unaligned_2, __mempcpy_chk_avx512_unaligned_erms, __mempcpy_chk_avx_unaligned_2, __mempcpy_chk_avx_unaligned_erms, __mempcpy_chk_sse2_unaligned_2, __mempcpy_chk_sse2_unaligned_erms, __mempcpy_avx512_unaligned_2, __mempcpy_avx512_unaligned_erms, __mempcpy_avx_unaligned_2, __mempcpy_avx_unaligned_erms, __mempcpy_sse2_unaligned_2, __mempcpy_sse2_unaligned_erms and __mempcpy_erms. * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: New file. * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S: Likwise. * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S: Likwise. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: Likwise.
2016-03-31S390: Extend structs La_s390_regs / La_s390_retval with vector-registers.Stefan Liebler4-65/+136
Starting with z13, vector registers can also occur as argument registers. Thus the passed input/output register structs for la_s390_[32|64]_gnu_plt[enter|exit] functions should reflect those new registers. This patch extends these structs La_s390_regs and La_s390_retval and adjusts _dl_runtime_profile() to handle those fields in case of running on a z13 machine. ChangeLog: * sysdeps/s390/bits/link.h: (La_s390_vr) New typedef. (La_s390_32_regs): Append vector register lr_v24-lr_v31. (La_s390_64_regs): Likewise. (La_s390_32_retval): Append vector register lrv_v24. (La_s390_64_retval): Likeweise. * sysdeps/s390/s390-32/dl-trampoline.h (_dl_runtime_profile): Handle extended structs La_s390_32_regs and La_s390_32_retval. * sysdeps/s390/s390-64/dl-trampoline.h (_dl_runtime_profile): Handle extended structs La_s390_64_regs and La_s390_64_retval.
2016-03-31S390: Save and restore fprs/vrs while resolving symbols.Stefan Liebler7-248/+516
On s390, no fpr/vrs were saved while resolving a symbol via _dl_runtime_resolve/_dl_runtime_profile. According to the abi, the fpr-arguments are defined as call clobbered. In leaf-functions, gcc 4.9 and newer can use fprs for saving/restoring gprs instead of saving them to the stack. If gcc do this in one of the resolver-functions, then the floating point arguments of a library-function are invalid for the first library-function-call. Thus, this patch saves/restores the fprs around the resolving code. The same could occur for vector registers. Furthermore an ifunc-resolver could also clobber the vector/floating point argument registers. Thus this patch provides the further variants _dl_runtime_resolve_vx/ _dl_runtime_profile_vx, which are used if the kernel claims, that we run on a machine with vector registers. Furthermore, if _dl_runtime_profile calls _dl_call_pltexit, the pointers to inregs-/outregs-structs were setup invalid. Now they point to the correct location in the stack-frame. Before branching back to the caller, the return values are now restored instead of containing the return values of the _dl_call_pltexit() call. On s390-32, an endless loop occurs if _dl_call_pltexit() should be called. Now, this code-path branches to this function instead of just after the preceding basr-instruction. ChangeLog: * sysdeps/s390/s390-32/dl-trampoline.S: Include dl-trampoline.h twice to create a non-vector/vector version for _dl_runtime_resolve and _dl_runtime_profile. Move implementation to ... * sysdeps/s390/s390-32/dl-trampoline.h: ... here. (_dl_runtime_resolve) Save and restore fpr/vrs. (_dl_runtime_profile) Save and restore vrs and fix some issues if _dl_call_pltexit is called. * sysdeps/s390/s390-32/dl-machine.h (elf_machine_runtime_setup): Choose the correct resolver function if running on a machine with vx. * sysdeps/s390/s390-64/dl-trampoline.S: Include dl-trampoline.h twice to create a non-vector/vector version for _dl_runtime_resolve and _dl_runtime_profile. Move implementation to ... * sysdeps/s390/s390-64/dl-trampoline.h: ... here. (_dl_runtime_resolve) Save and restore fpr/vrs. (_dl_runtime_profile) Save and restore vrs and fix some issues * sysdeps/s390/s390-64/dl-machine.h: (elf_machine_runtime_setup): Choose the correct resolver function if running on a machine with vx.
2016-03-31Fix tst-dlsym-error buildAdhemerval Zanella2-0/+5
This patch fixes the new test tst-dlsym-error build on aarch64 (and possible other architectures as well) due missing strchrnul definition. * elf/tst-dlsym-error.c: Include <string.h> for strchrnul.
2016-03-31Report dlsym, dlvsym lookup errors using dlerror [BZ #19509]Florian Weimer4-2/+125
* elf/dl-lookup.c (_dl_lookup_symbol_x): Report error even if skip_map != NULL. * elf/tst-dlsym-error.c: New file. * elf/Makefile (tests): Add tst-dlsym-error. (tst-dlsym-error): Link against libdl.