From 390955cbdeb674bead490fc3f74a8a0893ea83cf Mon Sep 17 00:00:00 2001 From: Ulrich Drepper Date: Mon, 11 Jan 1999 20:13:43 +0000 Subject: Update. 1999-01-11 Ulrich Drepper * ctype/Versions [GLIBC_2.0]: Export __ctype32_b. * include/wctype.h: Declare __iswctype. * stdio-common/vfscanf.c (__vfscanf): Use __iswspace instead of iswspace. * wctype/Makefile (routines): Add wcextra_l. * wctype/wcextra.c (iswblank): Implement function here and don't use __iswctype. (__iswblank_l): Move definition to... * wctype/wcextra_l.c: ...here. New file. * wctype/wcfuncs.c: Really implement functions and don't call __iswctype or __towctrans. * wctype/wctype.h: Change isw* and tow* macros. Don't call __iswctype or __towctrans. Instead optimize constant argument case. * iconv/gconv.h: Fix typos. * iconv/skeleton.c: Fix typos. Optimize init function a bit. Correctly emit escape sequence to return to initial state in conversion function. * iconvdata/iso-2022-jp.c (gconv_init): Correctly initialize max_needed_to element. * manual/mbyte.texi: Removed. This is now described in charset.texi. * manual/charset.texi: New file. * manual/Makefile (chapters): Replace mbyte by charset. * manual/ctype.texi: Document wide character functions. * manual/intro.texi: Fix reference to mbyte chapter. * manual/lang.texi: Likewise. * manual/locale.texi: Likewise. * manual/stdio.texi: Likewise. * manual/string.texi: Fix @node line for new charset chapter. * manual/libc.texinfo (UPDATED): Updated. Also update copyright years. * manual/memory.texi (savestring): Optimize code to give a good example. * manual/filesys.texi: Fix wording. Patches by Jim Meyering. * nscd/nscd_getgr_r.c: Include stdint.h to get uintptr_t definition. * nscd/nscd_getpw_r.c: Likewise. * nscd/nscd_gethst_r.c: Likewise. * stdlib/stdtold_l.c: Always include xlocale.h. 1999-01-11 Geoffrey Keating * stdlib/fpioconst.h (LDBL_MAX_10_EXP_LOG): Define to be same as DBL_MAX_10_EXP_LOG if there is no long double. (_fpioconst_pow10): Always use size as LDBL_MAX_10_EXP_LOG to match printf_fp.c. 1999-01-10 Andreas Jaeger * timezone/Makefile ($(testdata)/GB): Changed to ... ($(testdata)/Europe/London): ... for tst-timezone test. ($(objpfx)tst-timezone.out): Change GB to Europe/London. * timezone/tst-timezone.c (main): Enable DST switching test, change GB to Europe/London. 1999-01-10 Philip Blundell * socket/Makefile (headers): Remove bits/sockunion.h. 1999-01-09 Philip Blundell * socket/sys/socket.h: Don't include . * sysdeps/generic/bits/sockunion.h: Deleted. * sysdeps/unix/sysv/linux/bits/sockunion.h: Likewise. 1999-01-08 H.J. Lu * io/fts.c (fts_close): Don't access memory after having it freed. --- manual/ctype.texi | 521 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 513 insertions(+), 8 deletions(-) (limited to 'manual/ctype.texi') diff --git a/manual/ctype.texi b/manual/ctype.texi index 26e40a1..de90acb 100644 --- a/manual/ctype.texi +++ b/manual/ctype.texi @@ -15,11 +15,20 @@ are affected by the current locale. (More precisely, they are affected by the locale currently selected for character classification---the @code{LC_CTYPE} category; see @ref{Locale Categories}.) -@menu -* Classification of Characters:: Testing whether characters are - letters, digits, punctuation, etc. +The @w{ISO C} standard specifies two different sets of functions. The +one set works on @code{char} type characters, the other one on +@code{wchar_t} wide character (@pxref{Extended Char Intro}). -* Case Conversion:: Case mapping, and the like. +@menu +* Classification of Characters:: Testing whether characters are + letters, digits, punctuation, etc. + +* Case Conversion:: Case mapping, and the like. +* Classification of Wide Characters:: Character class determination for + wide characters. +* Using Wide Char Classes:: Notes on using the wide character + classes. +* Wide Character Case Conversion:: Mapping of wide characters. @end menu @node Classification of Characters, Case Conversion, , Character Handling @@ -57,14 +66,16 @@ These functions are declared in the header file @file{ctype.h}. @comment ctype.h @comment ISO @deftypefun int islower (int @var{c}) -Returns true if @var{c} is a lower-case letter. +Returns true if @var{c} is a lower-case letter. The letter need not be +from the Latin alphabet, any alphabet representable is valid. @end deftypefun @cindex upper-case character @comment ctype.h @comment ISO @deftypefun int isupper (int @var{c}) -Returns true if @var{c} is an upper-case letter. +Returns true if @var{c} is an upper-case letter. The letter need not be +from the Latin alphabet, any alphabet representable is valid. @end deftypefun @cindex alphabetic character @@ -188,7 +199,7 @@ into the US/UK ASCII character set. This function is a BSD extension and is also an SVID extension. @end deftypefun -@node Case Conversion, , Classification of Characters, Character Handling +@node Case Conversion, Classification of Wide Characters, Classification of Characters, Character Handling @section Case Conversion @cindex character case conversion @cindex case conversion of characters @@ -224,7 +235,7 @@ lower-case letter. If @var{c} is not an upper-case letter, @comment ctype.h @comment ISO @deftypefun int toupper (int @var{c}) -If @var{c} is a lower-case letter, @code{tolower} returns the corresponding +If @var{c} is a lower-case letter, @code{toupper} returns the corresponding upper-case letter. Otherwise @var{c} is returned unchanged. @end deftypefun @@ -249,3 +260,497 @@ with the SVID. @xref{SVID}.@refill This is identical to @code{toupper}, and is provided for compatibility with the SVID. @end deftypefun + + +@node Classification of Wide Characters, Using Wide Char Classes, Case Conversion, Character Handling +@section Character class determination for wide characters + +The second amendment to @w{ISO C89} defines functions to classify wide +character. Although the original @w{ISO C89} standard already defined +the type @code{wchar_t} but no functions operating on them were defined. + +The general design of the classification functions for wide characters +is more general. It allows to extend the set of available +classification beyond the set which is always available. The POSIX +standard specifies a way how the extension can be done and this is +already implemented in the GNU C library implementation of the +@code{localedef} program. + +The character class functions are normally implemented using bitsets. +I.e., for the character in question the appropriate bitset is read from +a table and a test is performed whether a certain bit is set in this +bitset. Which bit is tested for is determined by the class. + +For the wide character classification functions this is made visible. +There is a type representing the classification, a function to retrieve +this value for a specific class, and a function to test using the +classification value whether a given character is in this class. On top +of this the normal character classification functions as used for +@code{char} objects can be defined. + +@comment wctype.h +@comment ISO +@deftp {Data type} wctype_t +The @code{wctype_t} can hold a value which represents a character class. +The ony defined way to generate such a value is by using the +@code{wctype} function. + +@pindex wctype.h +This type is defined in @file{wctype.h}. +@end deftp + +@comment wctype.h +@comment ISO +@deftypefun wctype_t wctype (const char *@var{property}) +The @code{wctype} returns a value representing a class of wide +characters which is identified by the string @var{property}. Beside +some standard properties each locale can define its own ones. In case +no property with the given name is known for the current locale for the +@code{LC_CTYPE} category the function returns zero. + +@noindent +The properties known in every locale are: + +@multitable @columnfractions .25 .25 .25 .25 +@item +@code{"alnum"} @tab @code{"alpha"} @tab @code{"cntrl"} @tab @code{"digit"} +@item +@code{"graph"} @tab @code{"lower"} @tab @code{"print"} @tab @code{"punct"} +@item +@code{"space"} @tab @code{"upper"} @tab @code{"xdigit"} +@end multitable + +@pindex wctype.h +This function is declared in @file{wctype.h}. +@end deftypefun + +To test the membership of a character to one of the non-standard classes +the @w{ISO C} standard defines a completely new function. + +@comment wctype.h +@comment ISO +@deftypefun int iswctype (wint_t @var{wc}, wctype_t @var{desc}) +This function returns a nonzero value if @var{wc} is in the character +class specified by @var{desc}. @var{desc} must previously be returned +by a successful call to @code{wctype}. + +@pindex wctype.h +This function is declared in @file{wctype.h}. +@end deftypefun + +The make it easier to use the commonly used classification functions +they are defined in the C library. There is no need to use +@code{wctype} is the property string is one of the known character +classes. In some situations it is desirable to construct the property +string and then it gets important that @code{wctype} can also handle the +standard classes. + +@cindex alphanumeric character +@comment wctype.h +@comment ISO +@deftypefun int iswalnum (wint_t @var{wc}) +This function returns a nonzero value if @var{wc} is an alphanumeric +character (a letter or number); in other words, if either @code{iswalpha} +or @code{iswdigit} is true of a character, then @code{iswalnum} is also +true. + +@noindent +This function can be implemented using + +@smallexample +iswctype (wc, wctype ("alnum")) +@end smallexample + +@pindex wctype.h +This function is declared in @file{wctype.h}. +@end deftypefun + +@cindex alphabetic character +@comment wctype.h +@comment ISO +@deftypefun int iswalpha (wint_t @var{wc}) +Returns true if @var{wc} is an alphabetic character (a letter). If +@code{iswlower} or @code{iswupper} is true of a character, then +@code{iswalpha} is also true. + +In some locales, there may be additional characters for which +@code{iswalpha} is true---letters which are neither upper case nor lower +case. But in the standard @code{"C"} locale, there are no such +additional characters. + +@noindent +This function can be implemented using + +@smallexample +iswctype (wc, wctype ("alpha")) +@end smallexample + +@pindex wctype.h +This function is declared in @file{wctype.h}. +@end deftypefun + +@cindex control character +@comment wctype.h +@comment ISO +@deftypefun int iswcntrl (wint_t @var{wc}) +Returns true if @var{wc} is a control character (that is, a character that +is not a printing character). + +@noindent +This function can be implemented using + +@smallexample +iswctype (wc, wctype ("cntrl")) +@end smallexample + +@pindex wctype.h +This function is declared in @file{wctype.h}. +@end deftypefun + +@cindex digit character +@comment wctype.h +@comment ISO +@deftypefun int iswdigit (wint_t @var{wc}) +Returns true if @var{wc} is a digit (e.g., @samp{0} through @samp{9}). +Please note that this function does not only return a nonzero value for +@emph{decimal} digits, but for all kinds of digits. A consequence is +that code like the following will @strong{not} work unconditionally for +wide characters: + +@smallexample +n = 0; +while (iswctype (*wc)) + @{ + n *= 10; + n += *wc++ - L'0'; + @} +@end smallexample + +@noindent +This function can be implemented using + +@smallexample +iswctype (wc, wctype ("digit")) +@end smallexample + +@pindex wctype.h +This function is declared in @file{wctype.h}. +@end deftypefun + +@cindex graphic character +@comment wctype.h +@comment ISO +@deftypefun int iswgraph (wint_t @var{wc}) +Returns true if @var{wc} is a graphic character; that is, a character +that has a glyph associated with it. The whitespace characters are not +considered graphic. + +@noindent +This function can be implemented using + +@smallexample +iswctype (wc, wctype ("graph")) +@end smallexample + +@pindex wctype.h +This function is declared in @file{wctype.h}. +@end deftypefun + +@cindex lower-case character +@comment ctype.h +@comment ISO +@deftypefun int iswlower (wint_t @var{wc}) +Returns true if @var{wc} is a lower-case letter. The letter need not be +from the Latin alphabet, any alphabet representable is valid. + +@noindent +This function can be implemented using + +@smallexample +iswctype (wc, wctype ("lower")) +@end smallexample + +@pindex wctype.h +This function is declared in @file{wctype.h}. +@end deftypefun + +@cindex printing character +@comment wctype.h +@comment ISO +@deftypefun int iswprint (wint_t @var{wc}) +Returns true if @var{wc} is a printing character. Printing characters +include all the graphic characters, plus the space (@samp{ }) character. + +@noindent +This function can be implemented using + +@smallexample +iswctype (wc, wctype ("print")) +@end smallexample + +@pindex wctype.h +This function is declared in @file{wctype.h}. +@end deftypefun + +@cindex punctuation character +@comment wctype.h +@comment ISO +@deftypefun int iswpunct (wint_t @var{wc}) +Returns true if @var{wc} is a punctuation character. +This means any printing character that is not alphanumeric or a space +character. + +@noindent +This function can be implemented using + +@smallexample +iswctype (wc, wctype ("punct")) +@end smallexample + +@pindex wctype.h +This function is declared in @file{wctype.h}. +@end deftypefun + +@cindex whitespace character +@comment wctype.h +@comment ISO +@deftypefun int iswspace (wint_t @var{wc}) +Returns true if @var{wc} is a @dfn{whitespace} character. In the standard +@code{"C"} locale, @code{iswspace} returns true for only the standard +whitespace characters: + +@table @code +@item L' ' +space + +@item L'\f' +formfeed + +@item L'\n' +newline + +@item L'\r' +carriage return + +@item L'\t' +horizontal tab + +@item L'\v' +vertical tab +@end table + +@noindent +This function can be implemented using + +@smallexample +iswctype (wc, wctype ("space")) +@end smallexample + +@pindex wctype.h +This function is declared in @file{wctype.h}. +@end deftypefun + +@cindex upper-case character +@comment wctype.h +@comment ISO +@deftypefun int iswupper (wint_t @var{wc}) +Returns true if @var{wc} is an upper-case letter. The letter need not be +from the Latin alphabet, any alphabet representable is valid. + +@noindent +This function can be implemented using + +@smallexample +iswctype (wc, wctype ("upper")) +@end smallexample + +@pindex wctype.h +This function is declared in @file{wctype.h}. +@end deftypefun + +@cindex hexadecimal digit character +@comment wctype.h +@comment ISO +@deftypefun int iswxdigit (wint_t @var{wc}) +Returns true if @var{wc} is a hexadecimal digit. +Hexadecimal digits include the normal decimal digits @samp{0} through +@samp{9} and the letters @samp{A} through @samp{F} and +@samp{a} through @samp{f}. + +@noindent +This function can be implemented using + +@smallexample +iswctype (wc, wctype ("xdigit")) +@end smallexample + +@pindex wctype.h +This function is declared in @file{wctype.h}. +@end deftypefun + +The GNu C library provides also a function which is not defined in the +@w{ISO C} standard but which is available as a version for single byte +characters as well. + +@cindex blank character +@comment wctype.h +@comment GNU +@deftypefun int iswblank (wint_t @var{wc}) +Returns true if @var{wc} is a blank character; that is, a space or a tab. +This function is a GNU extension. It is declared in @file{wchar.h}. +@end deftypefun + +@node Using Wide Char Classes, Wide Character Case Conversion, Classification of Wide Characters, Character Handling +@section Notes on using the wide character classes + +The first note is probably nothing astonishing but still occasionally a +cause of problems. The @code{isw@var{XXX}} functions can be implemented +using macros and in fact, the GNU C library does this. They are still +available as real functions but when the @file{wctype.h} header is +included the macros will be used. This is nothing new compared to the +@code{char} type versions of these functions. + +The second notes covers something which is new. It can be best +illustrated by a (real-world) example. The first piece of code is an +excerpt from the original code. It is truncated a bit but the intention +should be clear. + +@smallexample +int +is_in_class (int c, const char *class) +@{ + if (strcmp (class, "alnum") == 0) + return isalnum (c); + if (strcmp (class, "alpha") == 0) + return isalpha (c); + if (strcmp (class, "cntrl") == 0) + return iscntrl (c); + ... + return 0; +@} +@end smallexample + +Now with the @code{wctype} and @code{iswctype} one could avoid the +@code{if} cascades. But rewriting the code as follows is wrong: + +@smallexample +int +is_in_class (int c, const char *class) +@{ + wctype_t desc = wctype (class); + return desc ? iswctype ((wint_t) c, desc) : 0; +@} +@end smallexample + +The problem is that it is not guarateed that the wide character +representation of a single-byte character can be found using casting. +In fact, usually this fails miserably. The correct solution for this +problem is to write the code as follows: + +@smallexample +int +is_in_class (int c, const char *class) +@{ + wctype_t desc = wctype (class); + return desc ? iswctype (btowc (c), desc) : 0; +@} +@end smallexample + +See @xref{Converting a Character} for more information on @code{btowc}. +Please note that this change probably does not improve the performance +of the program a lot since the @code{wctype} function still has to make +the string comparisons. But it gets really interesting if the +@code{is_in_class} function would be called more than once using the +same class name. In this case the variable @var{desc} could be computed +once and reused for all the calls. Therefore the above form of the +function is probably not the final one. + + +@node Wide Character Case Conversion, , Using Wide Char Classes, Character Handling +@section Mapping of wide characters. + +As for the classification functions the @w{ISO C} standard also +generalizes the mapping functions. Instead of only allowing the two +standard mappings the locale can contain others. Again, the +@code{localedef} program already supports generating such locale data +files. + +@comment wctype.h +@comment ISO +@deftp {Data Type} wctrans_t +This data type is defined as a scalar type which can hold a value +representing the locale-dependent character mapping. There is no way to +construct such a value beside using the return value of the +@code{wctrans} function. + +@pindex wctype.h +@noindent +This type is defined in @file{wctype.h}. +@end deftp + +@comment wctype.h +@comment ISO +@deftypefun wctrans_t wctrans (const char *@var{property} +The @code{wctrans} function has to be used to find out whether a named +mapping is defined in the current locale selected for the +@code{LC_CTYPE} category. If the returned value is non-zero it can +afterwards be used in calls to @code{towctrans}. If the return value is +zero no such mapping is known in the current locale. + +Beside locale-specific mappings there are two mappings which are +guaranteed to be available in every locale: + +@multitable @columnfractions .5 .5 +@item +@code{"tolower"} @tab @code{"toupper"} +@end multitable + +@pindex wctype.h +@noindent +This function is declared in @file{wctype.h}. +@end deftypefun + +@comment wctype.h +@comment ISO +@deftypefun wint_t towctrans (wint_t @var{wc}, wctrans_t @var{desc}) +The @code{towctrans} function maps the input character @var{wc} +according to the rules of the mapping for which @var{desc} is an +descriptor and returns the so found value. The @var{desc} value must be +obtained by a successful call to @code{wctrans}. + +@pindex wctype.h +@noindent +This function is declared in @file{wctype.h}. +@end deftypefun + +The @w{ISO C} standard also defines for the generally available mappings +convenient shortcuts so that it is not necesary to call @code{wctrans} +for them. + +@comment wctype.h +@comment ISO +@deftypefun wint_t towlower (wint_t @var{wc}) +If @var{wc} is an upper-case letter, @code{towlower} returns the corresponding +lower-case letter. If @var{wc} is not an upper-case letter, +@var{wc} is returned unchanged. + +@pindex wctype.h +@noindent +This function is declared in @file{wctype.h}. +@end deftypefun + +@comment wctype.h +@comment ISO +@deftypefun wint_t towupper (wint_t @var{wc}) +If @var{wc} is a lower-case letter, @code{towupper} returns the corresponding +upper-case letter. Otherwise @var{wc} is returned unchanged. + +@pindex wctype.h +@noindent +This function is declared in @file{wctype.h}. +@end deftypefun + +The same warnings given in the last section for the use of the wide +character classiffication function applies here. It is not possible to +simply cast a @code{char} type value to a @code{wint_t} and use it as an +argument for @code{towctrans} calls. -- cgit v1.1