aboutsummaryrefslogtreecommitdiff
path: root/locale
AgeCommit message (Collapse)AuthorFilesLines
2024-12-05Fix and sort variables in MakefilesH.J. Lu1-1/+1
Fix variables in Makefiles: 1. There is a tab, not a space, between "variable" and =, +=, :=. 2. The last entry doesn't have a trailing \. and sort them. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-11-25Silence most -Wzero-as-null-pointer-constant diagnosticsAlejandro Colomar3-8/+8
Replace 0 by NULL and {0} by {}. Omit a few cases that aren't so trivial to fix. Link: <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117059> Link: <https://software.codidact.com/posts/292718/292759#answer-292759> Signed-off-by: Alejandro Colomar <alx@kernel.org>
2024-10-14locale: Fix some spelling typosJonathan Wakely7-7/+7
Replace several cases of "Ingore" with "Ignore". Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-08-16support: Use macros for *stat wrappersFlorian Weimer1-1/+1
Macros will automatically use the correct types, without having to fiddle with internal glibc macros. It's also impossible to get the types wrong due to aliasing because support_check_stat_fd and support_check_stat_path do not depend on the struct stat* types. The changes reveal some inconsistencies in tests. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-06-17Define ISO 639-3 "ltg" (Latgalian) and add ltg_LV localeMike FABIAN1-0/+1
Resolves: BZ # 31411 References: https://iso639-3.sil.org/code/ltg https://en.wikipedia.org/wiki/Latgalian_language https://github.com/unicode-org/cldr/blob/main/common/main/ltg.xml
2024-04-22locale: Handle loading a missing locale twice (Bug 14247)Carlos O'Donell2-4/+17
Delay setting file->decided until the data has been successfully loaded by _nl_load_locale(). If the function fails to load the data then we must return and error and leave decided untouched to allow the caller to attempt to load the data again at a later time. We should not set decided to 1 early in the function since doing so may prevent attempting to load it again. We want to try loading it again because that allows an open to fail and set errno correctly. On the other side of this problem is that if we are called again with the same inputs we will fetch the cached version of the object and carry out no open syscalls and that fails to set errno so we must set errno to ENOENT in that case. There is a second code path that has to be handled where the name of the locale matches but the codeset doesn't match. These changes ensure that errno is correctly set on failure in all the return paths in _nl_find_locale(). Adds tst-locale-loadlocale to cover the bug. No regressions on x86_64. Co-authored-by: Jeff Law <law@redhat.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-03-11duplocale: protect use of global locale (bug 23970)Andreas Schwab1-6/+8
Protect the global locale from being modified while we compute the size of the locale category names. That allows the use of the global locale in a single thread, while all other threads use the thread safe locale functions.
2024-01-18Define ISO 639-3 "ssy" (Saho)Mike FABIAN1-0/+1
Related: BZ # 19956 References: https://iso639-3.sil.org/code/ssy https://en.wikipedia.org/wiki/Saho_language
2024-01-17Define ISO 639-3 "gbm" (Garhwali)Mike FABIAN1-0/+1
Related: BZ # 19479 References: https://iso639-3.sil.org/code/gbm https://en.wikipedia.org/wiki/Garhwali_language
2024-01-11Define ISO 639-3 "glk" (Gilaki)Mike FABIAN1-0/+1
Resolves: BZ # 27163 References: https://iso639-3.sil.org/code/glk https://en.wikipedia.org/wiki/Gilaki_language
2024-01-10locale: Sort Makefile variables.Carlos O'Donell1-29/+97
Sort Makefile variables using scrips/sort-makefile-lines.py. No regressions on x86_64.
2024-01-09localedata: add new locale zgh_MAMike FABIAN1-0/+1
Resolves: BZ # 12908 https://iso639-3.sil.org/code/zgh
2024-01-01Update copyright dates not handled by scripts/update-copyrightsPaul Eggert2-2/+2
I've updated copyright dates in glibc for 2024. This is the patch for the changes not generated by scripts/update-copyrights and subsequent build / regeneration of generated files.
2024-01-01Update copyright in generated files by running "make"Paul Eggert2-2/+2
2024-01-01Update copyright dates with scripts/update-copyrightsPaul Eggert107-107/+107
2023-10-30crypt: Remove libcrypt supportAdhemerval Zanella6-5/+450
All the crypt related functions, cryptographic algorithms, and make requirements are removed, with only the exception of md5 implementation which is moved to locale folder since it is required by localedef for integrity protection (libc's locale-reading code does not check these, but localedef does generate them). Besides thec code itself, both internal documentation and the manual is also adjusted. This allows to remove both --enable-crypt and --enable-nss-crypt configure options. Checked with a build for all affected ABIs. Co-authored-by: Zack Weinberg <zack@owlfolio.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2023-05-24locale/programs/locarchive.c: fix warn unused resultFrédéric Bérat1-8/+16
Fix unused result warnings, detected when _FORTIFY_SOURCE is enabled in glibc. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-04-26locale/programs/locarchive.c: Remove unnecessary check in add_locale_archiveFrédéric Bérat1-1/+1
Since asprintf is called "if (mask & XPG_NORM_CODESET)" there is no point in checking the mask again within the asprintf call. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-03-27Move libc_freeres_ptrs and libc_subfreeres to hidden/weak functionsAdhemerval Zanella Netto3-5/+5
They are both used by __libc_freeres to free all library malloc allocated resources to help tooling like mtrace or valgrind with memory leak tracking. The current scheme uses assembly markers and linker script entries to consolidate the free routine function pointers in the RELRO segment and to be freed buffers in BSS. This patch changes it to use specific free functions for libc_freeres_ptrs buffers and call the function pointer array directly with call_function_static_weak. It allows the removal of both the internal macros and the linker script sections. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2023-02-16C2x strtol binary constant handlingJoseph Myers1-0/+10
C2x adds binary integer constants starting with 0b or 0B, and supports those constants in strtol-family functions when the base passed is 0 or 2. Implement that strtol support for glibc. As discussed at <https://sourceware.org/pipermail/libc-alpha/2020-December/120414.html>, this is incompatible with previous C standard versions, in that such an input string starting with 0b or 0B was previously required to be parsed as 0 (with the rest of the string unprocessed). Thus, as proposed there, this patch adds 20 new __isoc23_* functions with appropriate header redirection support. This patch does *not* do anything about scanf %i (which will need 12 new functions per long double variant, so 12, 24 or 36 depending on the glibc configuration), instead leaving that for a future patch. The function names would remain as __isoc23_* even if C2x ends up published in 2024 rather than 2023. Making this change leads to the question of what should happen to internal uses of these functions in glibc and its tests. The header redirection (which applies for _GNU_SOURCE or any other feature test macros enabling C2x features) has the effect of redirecting internal uses but without those uses then ending up at a hidden alias (see the comment in include/stdio.h about interaction with libc_hidden_proto). It seems desirable for the default for internal uses to be the same versions used by normal code using _GNU_SOURCE, so rather than doing anything to disable that redirection, similar macro definitions to those in include/stdio.h are added to the include/ headers for the new functions. Given that the default for uses in glibc is for the redirections to apply, the next question is whether the C2x semantics are correct for all those uses. Uses with the base fixed to 10, 16 or any other value other than 0 or 2 can be ignored. I think this leaves the following internal uses to consider (an important consideration for review of this patch will be both whether this list is complete and whether my conclusions on all entries in it are correct): benchtests/bench-malloc-simple.c benchtests/bench-string.h elf/sotruss-lib.c math/libm-test-support.c nptl/perf.c nscd/nscd_conf.c nss/nss_files/files-parse.c posix/tst-fnmatch.c posix/wordexp.c resolv/inet_addr.c rt/tst-mqueue7.c soft-fp/testit.c stdlib/fmtmsg.c support/support_test_main.c support/test-container.c sysdeps/pthread/tst-mutex10.c I think all of these places are OK with the new semantics, except for resolv/inet_addr.c, where the POSIX semantics of inet_addr do not allow for binary constants; thus, I changed that file (to use __strtoul_internal, whose semantics are unchanged) and added a test for this case. In the case of posix/wordexp.c I think accepting binary constants is OK since POSIX explicitly allows additional forms of shell arithmetic expressions, and in stdlib/fmtmsg.c SEV_LEVEL is not in POSIX so again I think accepting binary constants is OK. Functions such as __strtol_internal, which are only exported for compatibility with old binaries from when those were used in inline functions in headers, have unchanged semantics; the __*_l_internal versions (purely internal to libc and not exported) have a new argument to specify whether to accept binary constants. As well as for the standard functions, the header redirection also applies to the *_l versions (GNU extensions), and to legacy functions such as strtoq, to avoid confusing inconsistency (the *q functions redirect to __isoc23_*ll rather than needing their own __isoc23_* entry points). For the functions that are only declared with _GNU_SOURCE, this means the old versions are no longer available for normal user programs at all. An internal __GLIBC_USE_C2X_STRTOL macro is used to control the redirections in the headers, and cases in glibc that wish to avoid the redirections - the function implementations themselves and the tests of the old versions of the GNU functions - then undefine and redefine that macro to allow the old versions to be accessed. (There would of course be greater complexity should we wish to make any of the old versions into compat symbols / avoid them being defined at all for new glibc ABIs.) strtol_l.c has some similarity to strtol.c in gnulib, but has already diverged some way (and isn't listed at all at https://sourceware.org/glibc/wiki/SharedSourceFiles unlike strtoll.c and strtoul.c); I haven't made any attempts at gnulib compatibility in the changes to that file. I note incidentally that inttypes.h and wchar.h are missing the __nonnull present on declarations of this family of functions in stdlib.h; I didn't make any changes in that regard for the new declarations added.
2023-01-11locale: Use correct buffer size for utf8_sequence_error [BZ #19444]Adhemerval Zanella1-1/+1
The buffer used by snprintf might not be large enough for all possible inputs, as indicated by gcc with -O1: ../locale/programs/linereader.c: In function ‘utf8_sequence_error’: ../locale/programs/linereader.c:713:58: error: ‘%02x’ directive output may be truncated writing between 2 and 8 bytes into a region of size between 1 and 13 [-Werror=format-truncation=] 713 | snprintf (buf, sizeof (buf), "0x%02x 0x%02x 0x%02x 0x%02x", | ^~~~ ../locale/programs/linereader.c:713:34: note: directive argument in the range [0, 2147483647] 713 | snprintf (buf, sizeof (buf), "0x%02x 0x%02x 0x%02x 0x%02x", | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../locale/programs/linereader.c:713:5: note: ‘snprintf’ output between 20 and 38 bytes into a destination of size 30 713 | snprintf (buf, sizeof (buf), "0x%02x 0x%02x 0x%02x 0x%02x", | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 714 | ch1, ch2, ch3, ch4); | ~~~~~~~~~~~~~~~~~~~ Checked on x86_64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2023-01-06Update copyright dates not handled by scripts/update-copyrightsJoseph Myers2-2/+2
I've updated copyright dates in glibc for 2023. This is the patch for the changes not generated by scripts/update-copyrights and subsequent build / regeneration of generated files.
2023-01-06Update copyright dates with scripts/update-copyrightsJoseph Myers107-107/+107
2023-01-06Remove trailing whitespaceJoseph Myers1-1/+1
For some reason this causes a pre-commit check error for a copyright date update commit, even though that commit doesn't touch anything near the line with this whitespace.
2022-10-05locale: prevent maybe-uninitialized errors with -Os [BZ #19444]Martin Jansa1-0/+7
Fixes following error when building with -Os: | In file included from strcoll_l.c:43: | strcoll_l.c: In function '__strcoll_l': | ../locale/weight.h:31:26: error: 'seq2.back_us' may be used uninitialized in this function [-Werror=maybe-uninitialized] | int_fast32_t i = table[*(*cpp)++]; | ^~~~~~~~~ | strcoll_l.c:304:18: note: 'seq2.back_us' was declared here | coll_seq seq1, seq2; | ^~~~ | In file included from strcoll_l.c:43: | ../locale/weight.h:31:26: error: 'seq1.back_us' may be used uninitialized in this function [-Werror=maybe-uninitialized] | int_fast32_t i = table[*(*cpp)++]; | ^~~~~~~~~ | strcoll_l.c:304:12: note: 'seq1.back_us' was declared here | coll_seq seq1, seq2; | ^~~~ Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>
2022-09-22Use '%z' instead of '%Z' on printf functionsAdhemerval Zanella Netto2-6/+6
The Z modifier is a nonstandard synonymn for z (that predates z itself) and compiler might issue an warning for in invalid conversion specifier. Reviewed-by: Florian Weimer <fweimer@redhat.com>
2022-07-22locale: Optimize tst-localedef-path-normAdhemerval Zanella2-111/+128
The locale generation are issues in parallel to try speed locale generation. The maximum number of jobs are limited to the online CPU (in hope to not overcommit on environments with lower cores than tests). On a Ryzen 9, the test execution improves from ~6.7s to ~1.4s. Tested-by: Mark Wielaard <mark@klomp.org>
2022-07-05localedef: Support building for older C standardsFlorian Weimer1-9/+11
Fixes commit b15538d77c6a7893c8bb42831dcd3a1a12b727d4 ("locale: localdef input files are now encoded in UTF-8").
2022-07-05locale: localdef input files are now encoded in UTF-8Florian Weimer1-11/+133
Previously, they were assumed to be in ISO-8859-1, and that the output charset overlapped with ISO-8859-1 for the characters actually used. However, this did not work as intended on many architectures even for an ISO-8859-1 output encoding because of the char signedness bug in lr_getc. Therefore, this commit switches to UTF-8 without making provisions for backwards compatibility. The following Elisp code can be used to convert locale definition files to UTF-8: (defun glibc/convert-localedef (from to) (interactive "r") (save-excursion (save-restriction (narrow-to-region from to) (goto-char (point-min)) (save-match-data (while (re-search-forward "<U\\([0-9a-fA-F]+\\)>" nil t) (let* ((codepoint (string-to-number (match-string 1) 16)) (converted (cond ((memq codepoint '(?/ ?\ ?< ?>)) (string ?/ codepoint)) ((= codepoint ?\") "<U0022>") (t (string codepoint))))) (replace-match converted t))))))) Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>
2022-07-05locale: Introduce translate_unicode_codepoint into linereader.cFlorian Weimer1-82/+85
This will permit reusing the Unicode character processing for different character encodings, not just the current <U...> encoding. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>
2022-07-05locale: Fix signed char bug in lr_getcFlorian Weimer1-1/+1
The array lr->buf contains characters, which can be signed. A 0xff byte in the input could be incorrectly reported as EOF. More importantly, get_string in linereader.c converts a signed input byte to a Unicode code point using ADDWC ((uint32_t) ch), under the assumption that this decodes the ISO-8859-1 input encoding. If char is signed, this does not give the correct result. This means that ISO-8859-1 input files for localedef are not actually supported, contrary to the comment in get_string. This is a happy accident because we can therefore change the file encoding to UTF-8 without impacting backwards compatibility. While at it, remove the \32 check for MS-DOS end-of-file character (^Z). Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>
2022-07-05locale: Turn ADDC and ADDS into functions in linereader.cFlorian Weimer1-99/+104
And introduce struct lr_buffer. The functions addc and adds can be called from functions, enabling subsequent refactoring. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>
2022-05-23locale: Add more cached data to LC_CTYPEFlorian Weimer3-5/+100
This data will be used in number formatting. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2022-05-23locale: Remove private union from struct __locale_dataFlorian Weimer13-19/+19
This avoids an alias violation later. This commit also fixes an incorrect double-checked locking idiom in _nl_init_era_entries. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2022-05-23locale: Remove cleanup function pointer from struct __localedataFlorian Weimer5-21/+25
We can call the cleanup functions directly from _nl_unload_locale if we pass the category to it. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2022-05-23locale: Call _nl_unload_locale from _nl_archive_subfreeresFlorian Weimer1-7/+1
The function performs the same steps for ld_archive locales (mapped from an archive), and this code is not performance-critical, so the specialization does not add value. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2022-04-13Replace {u}int_fast{16|32} with {u}int32_tNoah Goldstein2-2/+2
On 32-bit machines this has no affect. On 64-bit machines {u}int_fast{16|32} are set as {u}int64_t which is often not ideal. Particularly x86_64 this change both saves code size and may save instruction cost. Full xcheck passes on x86_64.
2022-04-07Add rif_MA locale [BZ #27781]Ilyahoo Proshel1-0/+1
Resolves: BZ #27781
2022-03-31locale: Remove set but unused variable on ld-collate.cAdhemerval Zanella1-8/+1
Checked on x86_64-linux-gnu and i686-linux-gnu.
2022-03-23locale: Remove ununsed wctype_table_get functionAdhemerval Zanella1-27/+0
2022-03-14Define ISO 639-3 "tok" [BZ #28950]Carlos O'Donell1-0/+1
Effective 2022-01-20 via SIL request 2021-043 the identifier "tok" is now active for Toki Pona in the code set for ISO 639-3. References: https://iso639-3.sil.org/code/tok https://iso639-3.sil.org/sites/iso639-3/files/change_requests/2021/2021-043.pdf No regressions on x86_64.
2022-02-25localedef: Update LC_MONETARY handling (Bug 28845)Carlos O'Donell1-36/+146
ISO C17, POSIX Issue 7, and ISO 30112 all allow the char* types to be empty strings i.e. "", integer or char values to be -1 or CHAR_MAX respectively, with the exception of decimal_point which must be non-empty in ISO C. Note that the defaults for mon_grouping vary, but are functionaly equivalent e.g. "\177" (no further grouping reuqired) vs. "" (no grouping defined for all groups). We include a broad comment talking about harmonizing ISO C, POSIX, ISO 30112, and the default C/POSIX locale for glibc. We reorder all setting based on locale/categories.def order. We soften all missing definitions from errors to warnings when defaults exist. Given that ISO C, POSIX and ISO 30112 allow the empty string we change LC_MONETARY handling of mon_decimal_point to allow the empty string. If mon_decimal_point is not defined at all then we pick the existing legacy glibc default value of <U002E> i.e. ".". We also set the default for mon_thousands_sep_wc at the same time as mon_thousands_sep, but this is not a change in behaviour, it is always either a matching value or L'\0', but if in the future we change the default to a non-empty string we would need to update both at the same time. Tested on x86_64 and i686 without regressions. Tested with install-locale-archive target. Tested with install-locale-files target. Reviewed-by: DJ Delorie <dj@redhat.com>
2022-02-24localedef: Handle symbolic links when generating locale-archiveArjun Shankar1-1/+1
Whenever locale data for any locale included symbolic links, localedef would throw the error "incomplete set of locale files" and exclude it from the generated locale archive. This commit fixes that. Co-authored-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-02-01localedef: Fix handling of empty mon_decimal_point (Bug 28847)Carlos O'Donell1-3/+1
The handling of mon_decimal_point is incorrect when it comes to handling the empty "" value. The existing parser in monetary_read() will correctly handle setting the non-wide-character value and the wide-character value e.g. STR_ELEM_WC(mon_decimal_point) if they are set in the locale definition. However, in monetary_finish() we have conflicting TEST_ELEM() which sets a default value (if the locale definition doesn't include one), and subsequent code which looks for mon_decimal_point to be NULL to issue a specific error message and set the defaults. The latter is unused because TEST_ELEM() always sets a default. The simplest solution is to remove the TEST_ELEM() check, and allow the existing check to look to see if mon_decimal_point is NULL and set an appropriate default. The final fix is to move the setting of mon_decimal_point_wc so it occurs only when mon_decimal_point is being set to a default, keeping both values consistent. There is no way to tell the difference between mon_decimal_point_wc having been set to the empty string and not having been defined at all, for that distinction we must use mon_decimal_point being NULL or "", and so we must logically set the default together with mon_decimal_point. Lastly, there are more fixes similar to this that could be made to ld-monetary.c, but we avoid that in order to fix just the code required for mon_decimal_point, which impacts the ability for C.UTF-8 to set mon_decimal_point to "", since without this fix we end up with an inconsistent setting of mon_decimal_point set to "", but mon_decimal_point_wc set to "." which is incorrect. Tested on x86_64 and i686 without regression. Reviewed-by: Florian Weimer <fweimer@redhat.com>
2022-01-01Update automatically-generated copyright datesPaul Eggert2-200/+200
These were updated simply by running "make" to regen the files.
2022-01-01Update copyright dates not handled by scripts/update-copyrights.Paul Eggert2-2/+2
I've updated copyright dates in glibc for 2022. This is the patch for the changes not generated by scripts/update-copyrights and subsequent build / regeneration of generated files. As well as the usual annual updates, mainly dates in --version output (minus csu/version.c which previously had to be handled manually but is now successfully updated by update-copyrights), there is a small change to the copyright notice in NEWS which should let NEWS get updated automatically next year. Please remember to include 2022 in the dates for any new files added in future (which means updating any existing uncommitted patches you have that add new files to use the new copyright dates in them).
2022-01-01Update copyright dates with scripts/update-copyrightsPaul Eggert105-105/+105
I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 7061 files FOO. I then removed trailing white space from math/tgmath.h, support/tst-support-open-dev-null-range.c, and sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following obscure pre-commit check failure diagnostics from Savannah. I don't know why I run into these diagnostics whereas others evidently do not. remote: *** 912-#endif remote: *** 913: remote: *** 914- remote: *** error: lines with trailing whitespace found ... remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
2021-12-07localedef: check magic value on archive load [BZ #28650]Aurelien Jarno1-0/+7
localedef currently blindly trust the archive header. When passed an archive file with the wrong endianess, this leads to a segmentation fault: $ localedef --big-endian --list-archive /usr/lib/locale/locale-archive Segmentation fault (core dumped) When passed non-archive files, asserts are reported on the best case, but sometimes it can lead to a segmentation fault: $ localedef --list-archive /bin/true localedef: programs/locarchive.c:1643: show_archive_content: Assertion `used < GET (head->namehash_used)' failed. Aborted (core dumped) $ localedef --list-archive /usr/lib/locale/C.utf8/LC_COLLATE Segmentation fault (core dumped) This patch improves the user experience by looking at the magic value, which is always written, but never checked. It should still be possible to trigger a segmentation fault with crafted files, but this already catch many cases.
2021-09-06locale: Add missing second argument to _Static_assert in C-collate-seq.cFlorian Weimer1-1/+1
2021-09-06Add 'codepoint_collation' support for LC_COLLATE.Carlos O'Donell6-229/+286
Support a new directive 'codepoint_collation' in the LC_COLLATE section of a locale source file. This new directive causes all collation rules to be dropped and instead STRCMP (strcmp or wcscmp) is used for collation of the input character set. This is required to allow for a C.UTF-8 that contains zero collation rules (minimal size) and sorts using code point sorting. To date the only implementation of a locale with zero collation rules is the C/POSIX locale. The C/POSIX locale provides identity tables for _NL_COLLATE_COLLSEQMB and _NL_COLLATE_COLLSEQWC that map to ASCII even though it has zero rules. This has lead to existing fnmatch, regexec, and regcomp implementations that require these tables. It is not correct to use these tables when nrules == 0, but the conservative fix is to provide these tables when nrules == 0. This assures that existing static applications using a new C.UTF-8 locale with 'codepoint_collation' at least have functional range expressions with ASCII e.g. [0-9] or [a-z]. Such static applications would not have the fixes to fnmatch, regexec and regcomp that avoid the use of the tables when nrules == 0. Future fixes to fnmatch, regexec, and regcomp would allow range expressions to use the full set of code points for such ranges. Tested on x86_64 and i686 without regression. Reviewed-by: Florian Weimer <fweimer@redhat.com>