diff options
author | Jeff Johnston <jjohnstn@redhat.com> | 2004-09-13 17:10:20 +0000 |
---|---|---|
committer | Jeff Johnston <jjohnstn@redhat.com> | 2004-09-13 17:10:20 +0000 |
commit | d4c8e53b22246384fe8f9d259e5cc0f28c660c40 (patch) | |
tree | 97cfc576e5810d6881cf12f27e3f753d367ba1bc /newlib | |
parent | de4e88a8dfabd86b221ecc05a2b41d6375640891 (diff) | |
download | newlib-d4c8e53b22246384fe8f9d259e5cc0f28c660c40.zip newlib-d4c8e53b22246384fe8f9d259e5cc0f28c660c40.tar.gz newlib-d4c8e53b22246384fe8f9d259e5cc0f28c660c40.tar.bz2 |
2004-09-13 Artem B. Bityuckiy <dedekind@oktetlabs.ru>
* libc/iconv/iconv.tex: Updated with new content.
* libc/iconv/lib/iconvnls.c: Reference ICONV_DEFAULT_NLSPATH
instead of NLS_DEFAULT_NLSPATH.
* libc/iconv/lib/iconvnls.h: Fix typo.
* libc/include/sys/iconvnls.h: New file.
Diffstat (limited to 'newlib')
-rw-r--r-- | newlib/ChangeLog | 8 | ||||
-rw-r--r-- | newlib/libc/iconv/iconv.tex | 1107 | ||||
-rw-r--r-- | newlib/libc/iconv/lib/iconvnls.c | 2 | ||||
-rw-r--r-- | newlib/libc/iconv/lib/iconvnls.h | 6 | ||||
-rw-r--r-- | newlib/libc/include/sys/iconvnls.h | 77 |
5 files changed, 1055 insertions, 145 deletions
diff --git a/newlib/ChangeLog b/newlib/ChangeLog index 5ac279f..db60803 100644 --- a/newlib/ChangeLog +++ b/newlib/ChangeLog @@ -1,3 +1,11 @@ +2004-09-13 Artem B. Bityuckiy <dedekind@oktetlabs.ru> + + * libc/iconv/iconv.tex: Updated with new content. + * libc/iconv/lib/iconvnls.c: Reference ICONV_DEFAULT_NLSPATH + instead of NLS_DEFAULT_NLSPATH. + * libc/iconv/lib/iconvnls.h: Fix typo. + * libc/include/sys/iconvnls.h: New file. + 2004-09-09 Paul Brook <paul@codesourcery.com> * libc/include/sys/reent.h (struct _on_exit_args): Add _dso_handle diff --git a/newlib/libc/iconv/iconv.tex b/newlib/libc/iconv/iconv.tex index d8aa8a1..305c3ce 100644 --- a/newlib/libc/iconv/iconv.tex +++ b/newlib/libc/iconv/iconv.tex @@ -6,11 +6,18 @@ The iconv functions declarations are in @file{iconv.h}. @menu -* iconv:: Encoding conversion routines -* Introduction:: Introduction to iconv and encodings -* Supported encodings:: The list of currently supported encodings -* iconv design decisions:: General iconv library design issues and decisions -* iconv configuration:: iconv-related configure script options +* iconv:: Encoding conversion routines +* Introduction:: Introduction to iconv and encodings +* Supported encodings:: The list of currently supported encodings +* iconv design decisions:: General iconv library design issues +* iconv configuration:: iconv-related configure script options +* Encoding names:: How encodings are named. +* CCS tables:: CCS tables format and 'mktbl.pl' Perl script +* CES converters:: CES converters description +* The encodings description file:: The 'encoding.deps' file and 'mkdeps.pl' +* How to add new encoding:: The steps to add new encoding support +* The locale support interfaces:: Locale-related iconv interfaces +* Contact:: The author contact @end menu @page @@ -26,29 +33,30 @@ The iconv functions declarations are in @findex CCS @* The iconv library is intended to convert characters from one encoding to -another. It implements iconv(), iconv_open() and iconv_close() calls -defined by the Single Unix Specification. +another. It implements iconv(), iconv_open() and iconv_close() +calls, which are defined by the Single Unix Specification. @* In addition to these user-level interfaces, the iconv library also has -several useful internal interfaces which are needed to support coding -capabilities of the Locale infrastructure. Since Locale also needs to -convert various character sets to and from Wide characters set, iconv -library shares it's capabilities with Locale subsystem. Moreover, iconv -supports several features which are only needed for Locale infrastructure -(for example, the MB_CUR_MAX value). +several useful interfaces which are needed to support coding +capabilities of the Newlib Locale infrastructure. Since Locale +support also needs to +convert various character sets to and from the @emph{wide characters +set}, the iconv library shares it's capabilities with the Newlib Locale +subsystem. Moreover, the iconv library supports several features which are +only needed for the Locale infrastructure (for example, the MB_CUR_MAX value). @* -The Newlib iconv library was created using ideas of another iconv -library implemented by Konstantin Chuguev (ver 2.0). Thus, the Newlib iconv -library has double Copyright. The Newlib iconv library was rewritten from -scratch by Artem B. Bityuckiy and contains a lot of improvements with respect -to original iconv library. +The Newlib iconv library was created using concepts from another iconv +library implemented by Konstantin Chuguev (ver 2.0). The Newlib iconv library +was rewritten from scratch and contains a lot of improvements with respect to +the original iconv library. @* Terms like @dfn{encoding} or @dfn{character set} aren't well defined and -are used with various meanings. The following is definitions of terms -used in this documentation as well as in iconv library implementation: +are often used with various meanings. The following are the definitions of terms +which are used in this documentation as well as in the iconv library +implementation: @itemize @bullet @item @@ -56,7 +64,7 @@ used in this documentation as well as in iconv library implementation: @item @dfn{Character Set} or @dfn{Charset} - just a collection of -characters, i.e. encoding is a machine representation of character set; +characters, i.e. the encoding is the machine representation of the character set; @item @dfn{CCS} (@dfn{Coded Character Set}) - a mapping from an character set to a @@ -69,18 +77,18 @@ codes to a sequence of bytes; @* Users usually deal with encodings, for example, KOI8-R, Unicode, UTF-8, -ASCII, etc. Encodings are formed by the following chain: +ASCII, etc. Encodings are formed by the following chain of steps: @enumerate @item -User has a set of characters specific to his language (character set). +User has a set of characters which are specific to his or her language (character set). @item -Each character from this set uniquely numbered, resulting in CCS. +Each character from this set is uniquely numbered, resulting in an CCS. @item -Each number from CCS is converted to a sequence of bits or bytes by means -of CES resulting in some encoding. Thus, CES may be considered as a +Each number from the CCS is converted to a sequence of bits or bytes by means +of a CES and form some encoding. Thus, CES may be considered as a function of CCS which produces some encoding. Note, that CES may be applied to more than one CCS. @end enumerate @@ -89,27 +97,28 @@ applied to more than one CCS. Thus, an encoding may be considered as one or more CCS + CES. @* -Sometimes, there is no CES and in such cases Encoding is equivalent to CCS, -e.g. KOI8-R or ASCII. +Sometimes, there is no CES and in such cases encoding is equivalent +to CCS, e.g. KOI8-R or ASCII. @* -The example of more complicated encoding is UTF-8 which is the UCS -(or Unicode) CCS plus UTF-8 CES. +An example of a more complicated encoding is UTF-8 which is the UCS +(or Unicode) CCS plus the UTF-8 CES. @* The following is a brief list of iconv library features: @itemize @item -Generic architecture +Generic architecture; @item -Locale infrastructure support +Locale infrastructure support; @item -Automatic generation of code which handles various CES/CCS/Encoding/Names/Aliases -dependencies. +Automatic generation of the program code which handles +CES/CCS/Encoding/Names/Aliases dependencies; @item -The possibility to choose size- or speed-optimazed configuration +The ability to choose size- or speed-optimazed +configuration; @item -The possibility to exclude almost all unneeded code from linking. +The ability to exclude a lot of unneeded code and data from the linking step. @end itemize @@ -169,9 +178,10 @@ The possibility to exclude almost all unneeded code from linking. @findex win_1257 @findex win_1258 @* -The following is a list of currently supported encodings. The first column -corresponds to encoding name, the second to the list of its aliases, third -- to its CES and CCS components names, fourth - to its short description. +The following is the list of currently supported encodings. The first column +corresponds to the encoding name, the second column is the list of aliases, +the third column is its CES and CCS components names, and the fourth column +is a short description. @multitable @columnfractions .20 .26 .24 .30 @item @@ -195,7 +205,7 @@ csbig5, big_five, bigfive, cn_big5, cp950 @tab table_pcs / big5, us_ascii @tab -An encoding for Traditional Chinese. +The encoding for the Traditional Chinese. @item @@ -205,7 +215,7 @@ ibm775, cspc775baltic @tab table / cp775 @tab -An updated version of CP 437 that supports balitic languages. +The updated version of CP 437 that supports the balitic languages. @item @@ -215,8 +225,9 @@ ibm850, 850, cspc850multilingual @tab table / cp850 @tab -IBM 850 - an updated version of CP 437 where several Latin 1 characters have been -added instead of some less-often used characters like line-drawing and greek ones. +IBM 850 - the updated version of CP 437 where several Latin 1 characters have been +added instead of some less-often used characters like the line-drawing +and the greek ones. @item @@ -225,8 +236,8 @@ cp852 ibm852, 852, cspcp852 @tab @tab -IBM 852 - an updated version of CP 437 where several Latin 2 characters have been added -instead of some less-often used characters like line-drawing and greek ones. +IBM 852 - the updated version of CP 437 where several Latin 2 characters have been added +instead of some less-often used characters like the line-drawing and the greek ones. @item @@ -236,7 +247,7 @@ ibm855, 855, csibm855 @tab table / cp855 @tab -IBM 855 - an updated version of CP 437 that supports Cyrillic. +IBM 855 - the updated version of CP 437 that supports Cyrillic. @item @@ -246,8 +257,8 @@ cp866 @tab table / cp866 @tab -IBM 866 - an updated version of CP 855 which followes the more logical Russian alphabet -ordering of the alternativny variant that is preferred by many Russian users. +IBM 866 - the updated version of CP 855 which follows more the logical Russian alphabet +ordering of the alternative variant that is preferred by many Russian users. @item @@ -447,7 +458,7 @@ koi8ru @tab table / koi8_ru @tab -Obsoleted Ukrainian. +The obsolete Ukrainian. @item @@ -700,68 +711,71 @@ Win-1258 - Vietnamese7 that supports Cyrillic. + @page @node iconv design decisions @section iconv design decisions @findex CCS table @findex CES converter +@findex Speed-optimized tables +@findex Size-optimized tables @* The first iconv library design issue arises when considering the following two design approaches: @enumerate @item -Have modules which implement conversion from encoding A to encoding B -and vice versa, i.e., one conversion module relates to any two -encodings. +Have modules which implement conversion from the encoding A to the encoding B +and vice versa i.e., one conversion module relates to any two encodings. @item -Have modules which implement conversion from encoding A to fixed -encoding C and vice versa, i.e., on conversion module relates to any +Have modules which implement conversion from the encoding A to the fixed +encoding C and vice versa i.e., one conversion module relates to any one encoding A and one fixed encoding C. In this case, to convert from -encoding A to encoding B, two modules are needed in order to convert -from A to C and then from C to B. +the encoding A to the encoding B, two modules are needed (in order to convert +from A to C and then from C to B). @end enumerate @* -It's obvious, that we have a tradeoff between commonness/flexibility and +It's obvious, that we have tradeoff between commonality/flexibility and efficiency: the first method is more efficient since it converts -directly. But from other hand, it isn't so flexible since for each -encoding pair distinct module is needed. +directly; however, it isn't so flexible since for each +encoding pair a distinct module is needed. @* -The Newlib iconv uses the second method and always converts through 32 -bit UCS. But its design also allows to write specialized conversion +The Newlib iconv model uses the second method and always converts through the 32-bit +UCS but its design also allows one to write specialized conversion modules if the conversion speed is critical. @* -The second design issue is how to decompose encodings. +The second design issue is how to break down (decompose) encodings. The Newlib iconv library uses the fact that any encoding may be -considered as one or more CCS plus CES. It also decomposes its +considered as one or more CCS plus a CES. It also decomposes its conversion modules on @dfn{CES converter} plus one or more @dfn{CCS -tables}. CCS tables maps CCS to UCS and vice versa, CES converters -map CCS to encoding and vice versa. +tables}. CCS tables map CCS to UCS and vice versa; the CES converters +map CCS to the encoding and vice versa. @* -As an example, consider conversion from big5 encoding to EUC-TW -encoding. big5 encoding may be decomposed on ASCII and BIG5 CCSes plus -BIG5 CES. EUC-TW may be decomposed on CNS11643_PLANE1, CNS11643_PLANE2, -and CNS11643_PLANE14 CCSes plus EUC CES. +As the example, let's consider the conversion from the big5 encoding to +the EUC-TW encoding. The big5 encoding may be decomposed to the ASCII and BIG5 +CCS-es plus the BIG5 CES. EUC-TW may be decomposed on the CNS11643_PLANE1, CNS11643_PLANE2, +and CNS11643_PLANE14 CCS-es plus the EUC CES. @* -The euc_jp -> big5 conversion happens as follows: +The euc_jp -> big5 conversion is performed as follows: @enumerate @item -EUC converter performs EUC-TW encoding to correspondent CCSes transformation -(CNS11643_PLANE1, CNS11643_PLANE2 and CNS11643_PLANE14 CCSes); +The EUC converter performs the EUC-TW encoding to the corresponding CCS-es +transformation (CNS11643_PLANE1, CNS11643_PLANE2 and CNS11643_PLANE14 +CCS-es); @item -Obtained CCS codes are transformed to UCS codes using CNS11643_PLANE1, +The obtained CCS codes are transformed to the UCS codes using the CNS11643_PLANE1, CNS11643_PLANE2 and CNS11643_PLANE14 CCS tables; @item -Resulting UCS codes are transformed to ASCII and BIG5 codes using -correspondent CCS tables; +The resulting UCS codes are transformed to the ASCII and BIG5 codes using +the corresponding CCS tables; @item -Obtained CCS codes are transformed to big5 encoding using correspondent +The obtained CCS codes are transformed to the big5 encoding using the corresponding CES converter. @end enumerate @@ -770,116 +784,927 @@ Analogously, the backward conversion is performed as follows: @enumerate @item -BIG converter performs big5 encoding -> correspondent CCSes transformation -(ASCII and BIG5 CCSes); +The BIG5 converter performs the big5 encoding to the corresponding CCS-es transformation +(the ASCII and BIG5 CCS-es); @item -Obtained CCS codes are transformed to UCS codes using ASCII and BIG5 CCS tables; +The obtained CCS codes are transformed to the UCS codes using the ASCII and BIG5 CCS tables; @item -Resulting UCS codes are transformed to ASCII and BIG5 codes using -correspondent CCS tables; +The resulting UCS codes are transformed to the ASCII and BIG5 codes using +the corresponding CCS tables; @item -Obtained CCS codes are transformed to EUC-TW encoding using correspondent +The obtained CCS codes are transformed to the EUC-TW encoding using the corresponding CES converter. @end enumerate @* -Note, the above is just an example and real names (implemented in Newlib -iconv) of CES converters and CCS tables are slightly different. +Note, the above is just an example and real names (which are implemented +in the Newlib iconv) of the CES converters and the CCS tables are slightly different. @* The third design issue also relates to flexibility. Obviously, it isn't -wanted to always link all CES converters and CCS tables to the library -but instead, it is wanted to be able to load needed converters and tables -dynamically on demand. This isn't a problem on "big" machines like PC -but may be very problematical within "small" embedded systems. +desirable to always link all the CES converters and the CCS tables to the library +but instead, we want to be able to load the needed converters and tables +dynamically on demand. This isn't a problem on "big" machines such as +a PC, but it may be very problematical within "small" embedded systems. @* Since the CCS tables are just data, it is possible to load them -dynamically from external files. Instead, CES converters are algorithms -and contain some code and the dynamic library loading capability is needed. +dynamically from external files. The CES converters, on the other hand +are algorithms with some code so a dynamic library loading +capability is required. @* -Apart from possible restrictions applied by embedded systems (too few -RAM for example), the Newlib itself has no dynamic libraries support and, -therefore, all CES converters which will ever be uses must be linked into -the library. But the dynamic CCS tables loading is possible and is -implemented in the Newlib iconv library and may be enabled via Newlib +Apart from possible restrictions applied by embedded systems (small +RAM for example), Newlib itself has no dynamic library support and +therefore, all the CES converters which will ever be used must be linked into +the library. However, loading of the dynamic CCS tables is possible and is +implemented in the Newlib iconv library. It may be enabled via the Newlib configure script options. @* -The next design decision is the possibility to of fine iconv library -configuring. This means, that iconv isn't always link all it's -converters and tables (if no dynamical loading enabled) but instead, it -gives the possibility to enable only those encodings which are planned -to be used (see section about configure script options). +The next design issue is fine-tuning the iconv library +configuration. One important ability is for iconv to not link all it's +converters and tables (if dynamic loading is not enabled) but instead, +enable only those encodings which are specified at configuration +time (see the section about the configure script options). @* -Moreover, the Newlib iconv library configure options distinguish between -coding directions. This means, that not only supported encodings are -selectable, but the coding direction too. For example, if user wants -configuration which allows conversions from UTF-8 to UTF-16 and he -doesn't plan to use UTF-16 to UTF-8 conversions, he can enable exactly -that conversion direction (i.e., no UTF-16 -> UTF-8 -related code will -be included) thus saving some memory (note, that such technique allows to -exclude one half of CCS table from linking which may be big enough). +In addition, the Newlib iconv library configure options distinguish between +conversion directions. This means that not only are supported encodings +selectable, the conversion direction is as well. For example, if user wants +the configuration which allows conversions from UTF-8 to UTF-16 and +doesn't plan using the "UTF-16 to UTF-8" conversions, he or she can +enable only +this conversion direction (i.e., no "UTF-16 -> UTF-8"-related code will +be included) thus, saving some memory (note, that such technique allows to +exclude one half of a CCS table from linking which may be big enough). @* -One more design decision is speed- and size- optimized tables. Used can -select between them using s configure script option. Speed CCS tables -are the same as Size ones in case of 8 bit CCS (e.g.m KOI8-R), but for 16 -bit CCS Size-optimized table may be in 1.5-2 time less then -Speed-optimized ones. From the other hand, the conversion with speed -tables is in several times faster. +One more design aspect are the speed- and size- optimized tables. Users can +select between them using configure script options. The +speed-optimized CCS tables are the same as the size-optimized ones in +case of 8-bit CCS (e.g.m KOI8-R), but for 16-bit CCS-es the size-optimized +CCS tables may be 1.5 to 2 times less then the speed-optimized ones. On the +other hand, conversion with speed tables is several times faster. @* -Its worth to stress, that new encodings support can't be -dynamically added into already compiled Newlib library. Even if this -needs only additional CCS table and iconv is configured to use external -files with CCS tables (this isn't a fundamental restriction and the -possibility to add new Table-based encodings support dynamically, by -copying new .cct file, may be easily added). +Its worth to stress that the new encoding support can't be +dynamically added into an already compiled Newlib library, even if it +needs only an additional CCS table and iconv is configured to use +the external files with CCS tables (this isn't the fundamental restriction +and the possibility to add new Table-based encoding support dynamically, by +means of just adding new .cct file, may be easily added). @* -Theoretically, the compiled-in CCS tables may be more appropriate foe -embedded solutions since they are read-only and are placed to ROM, -whereas the dynamic loading needs more RAM. Moreover, in current -implementation, distinct copy of CCS file is loaded for each fore each -opened iconv descriptor even in case of the same encoding. +Theoretically, the compiled-in CCS tables should be more appropriate for +embedded systems than dynamically loaded CCS tables. This is because the compiled-in tables are read-only and can be placed in ROM +whereas dynamic loading requires RAM. Moreover, in the current iconv +implementation, a distinct copy of the dynamic CCS file is loaded for each opened iconv descriptor even in case of the same encoding. This means, for example, that if two iconv descriptors for -KOI8-R -> UCS-4BE and KOI8-R -> UTF-16BE are opened, two copies of -koi8-r .cct file will be loaded (actually, iconv loads only needed part -of these files). - +"KOI8-R -> UCS-4BE" and "KOI8-R -> UTF-16BE" are opened, two copies of +koi8-r .cct file will be loaded (actually, iconv loads only the needed part +of these files). On the other hand, in the case of compiled-in CCS tables, there will always be only one copy. @page @node iconv configuration @section iconv configuration @findex iconv configuration +@findex --enable-newlib-iconv-encodings +@findex --enable-newlib-iconv-from-encodings +@findex --enable-newlib-iconv-to-encodings +@findex --enable-newlib-iconv-external-ccs +@findex NLSPATH @* -To enable encoding support --enable-newlib-iconv-encodings configure +To enable an encoding, the @emph{--enable-newlib-iconv-encodings} configure script option should be used. This option accepts a comma-separated list -of encodings that should be enabled. Option enables each encoding in both +of @emph{encodings} that should be enabled. The option enables each encoding in both ("to" and "from") directions. @* ---enable-newlib-iconv-from-encodings configure script option enables +The @option{--enable-newlib-iconv-from-encodings} configure script option enables "from" support for each encoding that was passed to it. @* ---enable-newlib-iconv-to-encodings configure script option enables +The @option{--enable-newlib-iconv-to-encodings} configure script option enables "to" support for each encoding that was passed to it. @* -Example: if user plans only KOI8-R -> UTF-8, UTF-8 -> ISO-8859-5 and -KOI8-R -> UCS-2 conversions, the most optimal way (minimal iconv's -code and data will be linked) is to configure Newlib with ---enable-newlib-iconv-encodings=UTF-8 +Example: if user plans only the "KOI8-R -> UTF-8", "UTF-8 -> ISO-8859-5" and +"KOI8-R -> UCS-2" conversions, the most optimal way (minimal iconv +code and data will be linked) is to configure Newlib with the following +options: +@* +@code{--enable-newlib-iconv-encodings=UTF-8 --enable-newlib-iconv-from-encodings=KOI8-R ---enable-newlib-iconv-to-encodings=KOI8-R,ISO-8859-5 +--enable-newlib-iconv-to-encodings=UCS-2,ISO-8859-5} +@* +which is the same as +@* +@code{--enable-newlib-iconv-from-encodings=KOI8-R,UTF-8 +--enable-newlib-iconv-to-encodings=UCS-2,ISO-8859-5,UTF-8} +@* +User may also just use the +@* +@code{--enable-newlib-iconv-encodings=KOI8-R,ISO-8859-5,UTF-8,UCS-2} +@* +configure script option, but it isn't so optimal since there will be +some unneeded data and code. + +@* +The @option{--enable-newlib-iconv-external-ccs} option enables iconv's +capabilities to work with the external CCS files. + +@* +The @option{--enable-target-optspace} Newlib configure script option also affects +the iconv library. If this option is present, the library uses the size +optimized CCS tables. This means, that only the size-optimized CCS +tables will be linked or, if the +@option{--enable-newlib-iconv-external-ccs} configure script option was used, +the iconv library will load the size-optimized tables. If the +@option{--enable-target-optspace}configure script option is disabled, +the speed-optimized CCS tables are used. + +@* +Note: .cct files are searched by iconv_open in the $NLSPATH/iconv_data/ directory. +Thus, the NLSPATH environment variable should be set. + + + + + +@page +@node Encoding names +@section Encoding names +@findex encoding name +@findex encoding alias +@findex normalized name +@* +Each encoding has one @dfn{name} and a number of @dfn{aliases}. When +user works with the iconv library (i.e., when the @code{iconv_open} call +is used) both name or aliases may be used. The same is when encoding +names are used in configure script options. + +@* +Names and aliases may be specified in any case (small or capital +letters) and the @kbd{-} symbol is equivalent to the @kbd{_} symbol. +Also, when working with the iconv library, + +@* +Internally the Newlib iconv library always converts aliases to names. It +also converts names and aliases in the @dfn{normalized} form which means +that all capital letters are converted to small letters and the @kbd{-} +symbols are converted to @kbd{_} symbols. + + + + +@page +@node CCS tables +@section CCS tables +@findex Size-optimized CCS table +@findex Speed-optimized CCS table +@findex mktbl.pl Perl script +@findex .cct files +@findex The CCT tables source files +@findex CCS source files +@* +The iconv library stores files with CCS tables in the the @emph{ccs/} +subdirectory. The CCS tables for any CCS may be kept in two forms - in the binary form +(@dfn{.cct files}, see the @emph{ccs/binary/} subdirectory) and in form +of compilable .c source files. The .cct files are only used when the +@option{--enable-newlib-iconv-external-ccs} configure script option is enabled. +The .c files are linked to the Newlib library if the corresponding +encoding is enabled. + +@* +As stated earlier, the Newlib iconv library performs all +conversions through the 32-bit UCS, but the codes which are used +in most CCS-es, fit into the first 16-bit subset of the 32-bit UCS set. +Thus, in order to make the CCS tables more compact, the 16-bit UCS-2 is +used instead of the 32-bit UCS-4. + +@* +CCS tables may be 8- or 16-bit wide. 8-bit CCS tables map 8-bit CCS to +16-bit UCS-2 and vice versa while 16-bit CCS tables map +16-bit CCS to 16-bit UCS-2 and vice versa. +8-bit tables are small (in size) while 16-bit tables may be big enough. +Because of this, 16-bit CCS tables may be +either speed- or size-optimized. Size-optimized CCS tables are +smaller then speed-optimized ones, but the conversion process is +slower if the size-optimized CCS tables are used. 8-bit CCS tables have only +size-optimized variant. + +Each CCS table (both speed- and size-optimized) consists of +@dfn{from_ucs} and @dfn{to_ucs} subtables. "from_ucs" subtable maps +UCS-2 codes to CCS codes, while "to_ucs" subtable maps CCS codes to +UCS-2 codes. + +@* +Almost all 16-bit CCS tables contain less then 0xFFFF codes and +a lot of gaps exist. + +@subsection Speed-optimized tables format +@* +In case of 8-bit speed-optimized CCS tables the "to_ucs" subtables format is +trivial - it is just the array of 256 16-bit UCS codes. Therefore, an +UCS-2 code @emph{Y} corresponding to a @emph{X} CCS code is calculates +as @emph{Y = to_ucs[X]}. + +@* +Obviously, the simplest way to create the "from_ucs" table or the +16-bit "to_ucs" table is to use the huge 16-bit array like in case +of the 8-bit "to_ucs" table. But almost all the 16-bit CCS tables contain +less then 0xFFFF code maps and this fact may be exploited to reduce +the size of the CCS tables. + +@* +In this chapter the "UCS-2 -> CCS" 8-bit CCS table format is described. The +16-bit "CCS -> UCS-2" CCS table format is the same, except the mapping +direction and the CCS bits number. + +@* +In case of the 8-bit speed-optimized table the "from_ucs" subtable +corresponds the "from_ucs" array and has the following layout: + +@* +from_ucs array: +@* +------------------------------------- +@* +0xFF mapping (2 bytes) (only for +8-bit table). +@* +------------------------------------- +@* +Heading block +@* +------------------------------------- +@* +Block 1 +@* +------------------------------------- +@* +Block 2 +@* +------------------------------------- +@* + ... +@* +------------------------------------- +@* +Block N +@* +------------------------------------- + +@* +The 0x0000-0xFFFF 16-bit code range is divided to 256 code subranges. Each +subrange is represented by an 256-element @dfn{block} (256 1-byte +elements or 256 2-byte element in case of 16-bit CCS table) with +elements which are equivalent to the CCS codes of this subrange. +If the "UCS-2 -> CCS" mapping has big enough gaps, some blocks will be +absent and there will be less then 256 blocks. + +@* +Any element number @emph{m} of @dfn{the heading block} (which contains +256 2-byte elements) corresponds to the @emph{m}-th 256-element subrange. +If the subrange contains some codes, the value of the @emph{m}-th element of +the heading block contains the offset of the corresponding block in the +"from_ucs" array. If there is no codes in the subrange, the heading +block element contains 0xFFFF. @* ---enable-newlib-iconv-external-ccs option enables iconv's -capabilities to work with external CCS files. +If there are some gaps in a block, the corresponding block elements have +the 0xFF value. If there is an 0xFF code present in the CCS, it's mapping +is defined in the first 2-byte element of the "from_ucs" array. + +@* +Having such a table format, the algorithm of searching the CCS code +@emph{X} which corresponds to the UCS-2 code @emph{Y} is as follows. + +@* +@enumerate +@item If @emph{Y} is equivalent to the value of the first 2-byte element +of the "from_ucs" array, @emph{X} is 0xFF. Else, continue to search. + +@item Calculate the block number: @emph{BlkN = (Y & 0xFF00) >> 8}. +@item If the heading block element with number @emph{BlkN} is 0xFFFF, there +is no corresponding CCS code (error, wrong input data). Else, fetch the +"flom_ucs" array index of the @emph{BlkN}-th block. + +@item Calculate the offset of the @emph{X} code in its block: +@emph{Xindex = Y & 0xFF} + +@item If the @emph{Xintex}-th element of the block (which is equivalent to +@emph{from_ucs[BlkN+Xindex]}) value is 0xFF, there is no corresponding +CCS code (error, wrong input data). Else, @emph{X = from_ucs[BlkN+Xindex]}. +@end enumerate + +@subsection Size-optimized tables format +@* +As it is stated above, size-optimized tables exist only for 16-bit CCS-es. +This is because there is too small difference between the speed-optimized +and the size-optimized table sizes in case of 8-bit CCS-es. + +@* +Formats of the "to_ucs" and "from_ucs" subtables are equivalent in case of +size-optimized tables. + +This sections describes the format of the "UCS-2 -> CCS" size-optimized +CCS table. The format of "CCS -> UCS-2" table is the same. + +The idea of the size-optimized tables is to split the UCS-2 codes +("from" codes) on @dfn{ranges} (@dfn{range} is a number of consecutive UCS-2 codes). +Then CCS codes ("to" codes) are stored only for the codes from these +ranges. Distinct "from" codes, which have no range (@dfn{unranged codes}, are stored +together with the corresponding "to" codes. + +@* +The following is the layout of the size-optimized table array: + +@* +size_arr array: +@* +------------------------------------- +@* +Ranges number (2 bytes) +@* +------------------------------------- +@* +Unranged codes number (2 bytes) +@* +------------------------------------- +@* +Unranged codes array index (2 bytes) +@* +------------------------------------- +@* +Ranges indexes (triads) +@* +------------------------------------- +@* +Ranges @* -Note: CCS files are searched by iconv_open in $NLSPATH/iconv_data/ directory. +------------------------------------- +@* +Unranged codes array +@* +------------------------------------- + +@* +The @dfn{Unranged codes array index} @emph{size_arr} section helps to find +the offset of the needed range in the @emph{size_arr} and has +the following format (triads): +@* +the first code in range, the last code in range, range offset. + +@* +The array of these triads is sorted by the firs element, therefore it is +possible to quickly find the needed range index. + +@* +Each range has the corresponding sub-array containing the "to" codes. These +sub-arrays are stored in the place marked as "Ranges" in the layout +diagram. + +@* +The "Unranged codes array" contains pairs ("from" code, "to" code") for +each unranged code. The array of these pairs is sorted by "from" code +values, therefore it is possible to find the needed pair quickly. + +@* +Note, that each range requires 6 bytes to form its index. If, for +example, there are two ranges (1 - 5 and 9 - 10), and one unranged code +(7), 12 bytes are needed for two range indexes and 4 bytes for the unranged +code (total 16). But it is better to join both ranges as 1 - 10 and +mark codes 6 and 8 as absent. In this case, only 6 additional bytes for the +range index and 4 bytes to mark codes 6 and 8 as absent are needed +(total 10 bytes). This optimization is done in the size-optimized tables. +Thus, ranges may contain small gaps. The absent codes in ranges are marked +as 0xFFFF. + +@* +Note, a pair of "from" codes is stored by means of unranged codes since +the number of bytes which are needed to form the range is greater than +the number of bytes to store two unranged codes (5 against 4). + +@* +The algorithm of searching of the CCS code +@emph{X} which corresponds to the UCS-2 code @emph{Y} (input) in the "UCS-2 -> +CCS" size-optimized table is as follows. + +@* +@enumerate +@item Try to find the corresponding triad in the "Unranged codes array +index". Since we are searching in the sorted array, we can do it quickly +(divide by 2, compare, etc). + +@item If the triad is found, fetch the @emph{X} code from the corresponding +range array. If it is 0xFFFF, return an error. + +@item If there is no corresponding triad, search the @emph{X} code among the +sorted unranged codes. Return error, if noting was found. +@end enumerate + +@subsection .cct ant .c CCS Table files +@* +The .c source files for 8-bit CCS tables have "to_ucs" and "from_ucs" +speed-optimized tables. The .c source files for 16-bit CCS tables have +"to_ucs_speed", "to_ucs_size", "from_ucs_speed" and "from_ucs_size" +tables. + +@* +When .c files are compiled and used, all the 16-bit and 32-bit values +have the native endian format (Big Endian for the BE systems and Little +Endian for the LE systems) since they are compile for the system before +they are used. + +@* +In case of .cct files, which are intended for dynamic CCS tables +loading, the CCS tables are stored either in LE or BE format. Since the +.cct files are generated by the 'mktbl.pl' Perl script, it is possible +to choose the endianess of the tables. It is also possible to store two +copies (both LE and BE) of the CCS tables in one .cct file. The default +.cct files (which come with the Newlib sources) have both LE and BE CCS +tables. The Newlib iconv library automatically chooses the needed CCS tables +(with appropriate endianess). + +@* +Note, the .cct files are only used when the +@option{--enable-newlib-iconv-external-ccs} is used. + +@subsection The 'mktbl.pl' Perl script +@* +The 'mktbl.pl' script is intended to generate .cct and .c CCS table +files from the @dfn{CCS source files}. + +@* +The CCS source files are just text files which has one or more colons +with CCS <-> UCS-2 codes mapping. To see an example of the CCS table +source files see one of them using URL-s which will be given bellow. + +@* +The following table describes where the source files for CCS table files +provided by the Newlib distribution are located. + +@multitable @columnfractions .25 .75 +@item +Name +@tab +URL + +@item +@tab + +@item +big5 +@tab +http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT + +@item +cns11643_plane1 +cns11643_plane14 +cns11643_plane2 +@tab +http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/CNS11643.TXT + +@item +cp775 +cp850 +cp852 +cp855 +cp866 +@tab +http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/ + +@item +iso_8859_1 +iso_8859_2 +iso_8859_3 +iso_8859_4 +iso_8859_5 +iso_8859_6 +iso_8859_7 +iso_8859_8 +iso_8859_9 +iso_8859_10 +iso_8859_11 +iso_8859_13 +iso_8859_14 +iso_8859_15 +@tab +http://www.unicode.org/Public/MAPPINGS/ISO8859/ + +@item +iso_ir_111 +@tab +http://crl.nmsu.edu/~mleisher/csets/ISOIR111.TXT + +@item +jis_x0201_1976 +jis_x0208_1990 +jis_x0212_1990 +@tab +http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/JIS0201.TXT + +@item +koi8_r +@tab +http://www.unicode.org/Public/MAPPINGS/VENDORS/MISC/KOI8-R.TXT + +@item +koi8_ru +@tab +http://crl.nmsu.edu/~mleisher/csets/KOI8RU.TXT + +@item +koi8_u +@tab +http://crl.nmsu.edu/~mleisher/csets/KOI8U.TXT + +@item +koi8_uni +@tab +http://crl.nmsu.edu/~mleisher/csets/KOI8UNI.TXT + +@item +ksx1001 +@tab +http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/KSC/KSX1001.TXT + +@item +win_1250 +win_1251 +win_1252 +win_1253 +win_1254 +win_1255 +win_1256 +win_1257 +win_1258 +@tab +http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/ +@end multitable + +The CCS source files aren't distributed with Newlib because of License +restrictions in most Unicode.org's files. + +The following are 'mktbl.pl' options which were used to generate .cct +files. Note, to generate CCS tables source files @option{-s} option +should be added. + +@enumerate +@item For the iso_8859_10.cct, iso_8859_13.cct, iso_8859_14.cct, iso_8859_15.cct, +iso_8859_1.cct, iso_8859_2.cct, iso_8859_3.cct, iso_8859_4.cct, +iso_8859_5.cct, iso_8859_6.cct, iso_8859_7.cct, iso_8859_8.cct, +iso_8859_9.cct, iso_8859_11.cct, win_1250.cct, win_1252.cct, win_1254.cct +win_1256.cct, win_1258.cct, win_1251.cct, +win_1253.cct, win_1255.cct, win_1257.cct, +koi8_r.cct, koi8_ru.cct, koi8_u.cct, koi8_uni.cct, iso_ir_111.cct, +big5.cct, cp775.cct, cp850.cct, cp852.cct, cp855.cct, cp866.cct, cns11643.cct +files, only the @option{-i <SRC_FILE_NAME>} option were used. + +@item To generate the jis_x0208_1990.cct file, the +@option{-i jis_x0208_1990.txt -x 2 -y 3} options were used. + +@item To generate the cns11643_plane1.cct file, the +@option{-i cns11643.txt -p1 -N cns11643_plane1 -o cns11643_plane1.cct} +options were used. + +@item To generate the cns11643_plane2.cct file, the +@option{-i cns11643.txt -p2 -N cns11643_plane2 -o cns11643_plane2.cct} +options were used. + +@item To generate the cns11643_plane14.cct file, the +@option{-i cns11643.txt -p0xE -N cns11643_plane14 -o cns11643_plane14.cct} +options were used. +@end enumerate + +@* +For more info about the 'mktbl.pl' options, see the 'mktbl.pl -h' output. + +@* +It is assumed that CCS codes are 16 or less bits wide. If there are wider CCS codes +in the CCS source file, the bits which are higher then 16 defines plane (see the +cns11643.txt CCS source file). + +@* +Sometimes, it is impossible to map some CCS codes to the 16-bit UCS if, for example, +several different CCS codes are mapped to one UCS-2 code or one CCS code is mapped to +the pair of UCS-2 codes. In these cases, such CCS codes (@dfn{lost +codes}) aren't just rejected but instead, they are mapped to the default +UCS-2 code (which is currently the @kbd{?} character's code). + + + + + +@page +@node CES converters +@section CES converters +@findex PCS +@* +Similar to the CCS tables, CES converters are also split into "from UCS" +and "to UCS" parts. Depending on the iconv library configuration, these +parts are enabled or disabled. + +@* +The following it the list of CES converters which are currently present +in the Newlib iconv library. + +@itemize @bullet +@item +@emph{euc} - supports the @emph{euc_jp}, @emph{euc_kr} and @emph{euc_tw} +encodings. The @emph{euc} CES converter uses the @emph{table} and the +@emph{us_ascii} CES converters. + +@item +@emph{table} - this CES converter corresponds to "null" and just performs +tables-based conversion using 8- and 16-bit CCS tables. This converter +is also used by any other CES converter which needs the CCS table-based +conversions. The @emph{table} converter is also responsible for .cct files +loading. + +@item +@emph{table_pcs} - this is the wrapper over the @emph{table} converter +which is intended for 16-bit encodings which also use the @dfn{Portable +Character Set} (@dfn{PCS}) which is the same as the @emph{US-ASCII}. +This means, that if the first byte the CCS code is in range of [0x00-0x7f], +this is the 7-bit PCS code. Else, this is the 16-bit CCS code. Of course, +the 16-bit codes must not contain bytes in the range of [0x00-0x7f]. +The @emph{big5} encoding uses the @emph{table_pcs} CES converter and the +@emph{table_pcs} CES converter depends on the @emph{table} CES converter. + +@item +@emph{ucs_2} - intended for the @emph{ucs_2}, @emph{ucs_2be} and +@emph{ucs_2le} encodings support. + +@item +@emph{ucs_4} - intended for the @emph{ucs_4}, @emph{ucs_4be} and +@emph{ucs_4le} encodings support. + +@item +@emph{ucs_2_internal} - intended for the @emph{ucs_2_internal} encoding support. + +@item +@emph{ucs_4_internal} - intended for the @emph{ucs_4_internal} encoding support. + +@item +@emph{us_ascii} - intended for the @emph{us_ascii} encoding support. In +principle, the most natural way to support the @emph{us_ascii} encoding +is to define the @emph{us_ascii} CCS and use the @emph{table} CES +converter. But for the optimization purposes, the specialized +@emph{us_ascii} CES converter was created. + +@item +@emph{utf_16} - intended for the @emph{utf_16}, @emph{utf_16be} and +@emph{utf_16le} encodings support. + +@item +@emph{utf_8} - intended for the @emph{utf_8} encoding support. +@end itemize + + + + + +@page +@node The encodings description file +@section The encodings description file +@findex encoding.deps description file +@findex mkdeps.pl Perl script +@* +To simplify the process of adding new encodings support allowing to +automatically generate a lot of "glue" files. + +@* +There is the 'encoding.deps' file in the @emph{lib/} subdirectory which +is used to describe encoding's properties. The 'mkdeps.pl' Perl script +uses 'encoding.deps' to generates the "glue" files. + +@* +The 'encoding.deps' file is composed of sections, each section consists +of entries, each entry contains some encoding/CES/CCS description. + +@* +The 'encoding.deps' file's syntax is very simple. Currently only two +sections are defined: @emph{ENCODINGS} and @emph{CES_DEPENDENCIES}. + +@* +Each @emph{ENCODINGS} section's entry describes one encoding and +contains the following information. + +@itemize @bullet +@item +Encoding name (the @emph{ENCODING} field). The name should +be unique and only one name is possible. + +@item +The encoding's CES converter name (the @emph{CES} field). Only one CES +converter is allowed. + +@item +The whitespace-separated list of CCS table names which are used by the +encoding (the @emph{CCS} field). + +@item +The whitespace-separated list of aliases names (the @emph{ENCODING} +field). +@end itemize + +@* +Note all names in the 'encoding.deps' file have to have the normalized +form. + +@* +Each @emph{CES_DEPENDENCIES} section's entry describes dependencies of +one CES converted. For example, the @emph{euc} CES converter depends on +the @emph{table} and the @emph{us_ascii} CES converter since the +@emph{euc} CES converter uses them. This means, that both @emph{table} +and @emph{us_ascii} CES converters should be linked if the @emph{euc} +CES converter is enabled. + +@* +The @emph{CES_DEPENDENCIES} section defines the following: + +@itemize @bullet +@item +the CES converter name for which the dependencies are defined in this +entry (the @emph{CES} field); + +@item +the whitespace-separated list of CES converters which are needed for +this CES converter (the @emph{USED_CES} field). +@end itemize + +@* +The 'mktbl.pl' Perl script automatically solves the following tasks. + +@itemize @bullet +@item +User works with the iconv library in terms of encodings and doesn't know +anything about CES converters and CCS tables. The script automatically +generates code which enables all needed CES converters and CCS tables +for all encodings, which were enabled by the user. + +@item +The CES converters may have dependencies and the script automatically +generates the code which handles these dependencies. + +@item +The list of encoding's aliases is also automatically generated. + +@item +The script uses a lot of macros in order to enable only the minimum set +of code/data which is needed to support the requested encodings in the +requested directions. +@end itemize + +@* +The 'mktbl.pl' Perl script is intended to interpret the 'encoding.deps' +file and generates the following files. + +@itemize @bullet +@item +@emph{lib/encnames.h} - this header files contains macro definitions for all +encoding names + +@item +@emph{lib/aliasesbi.c} - the array of encoding names and aliases. The array +is used to find the name of requested encoding by it's alias. + +@item +@emph{ces/cesbi.c} - this file defines two arrays +(@code{_iconv_from_ucs_ces} and @code{_iconv_to_ucs_ces}) which contain +description of enabled "to UCS" and "from UCS" CES converters and the +names of encodings which are supported by these CES converters. + +@item +@emph{ces/cesbi.h} - this file contains the set of macros which defines +the set of CES converters which should be enabled if only the set of +enabled encodings is given (through macros defined in the +@emph{newlib.h} file). Note, that one CES converter may handle several +encodings. + +@item +@emph{ces/cesdeps.h} - the CES converters dependencies are handled in +this file. + +@item +@emph{ccs/ccsdeps.h} - the array of linked-in CCS tables is defined +here. + +@item +@emph{ccs/ccsnames.h} - this header files contains macro definitions for all +CCS names. + +@item +@emph{encoding.aliases} - the list of supported encodings and their +aliases which is intended for the Newlib configure scripts in order to +handle the iconv-related configure script options. +@end itemize + + + + + +@page +@node How to add new encoding +@section How to add new encoding +@* +At first, the new encoding should be broken down to CCS and CES. Then, +the process of adding new encoding is split to the following activities. + +@enumerate +@item Generate the .cct CCS file and the .c source file for the new +encoding's CCS (if it isn't already present). To do this, the CCS source +file should be had and the 'mktbl.pl' script should be used. + +@item Write the corresponding CES converter (if it isn't already +present). Use the existing CES converters as an example. + +@item +Add the corresponding entries to the 'encoding.deps' file and regenerate +the autogenerated "glue" files using the 'mkdeps.pl' script. + +@item +Don't forget to add entries to the newlib/newlib.hin file. + +@item +Of course, the 'Makefile.am'-s should also be updated (if new files were +added) and the 'Makefile.in'-s should be regenerated using the correct +version of 'automake'. + +@item +Don't forget to update the documentation (the list of +supported encodings and CES converters). +@end enumerate + +In case a new encoding doesn't fit to the CES/CCS decomposition model or +it is desired to add the specialized (non UCS-based) conversion support, +the Newlib iconv library code should be upgraded. + + + + + +@page +@node The locale support interfaces +@section The locale support interfaces +@* +The newlib iconv library also has some interface functions (besides the +@code{iconv}, @code{iconv_open} and @code{iconv_close} interfaces) which +are intended for the Locale subsystem. All the locale-related code is +placed in the @emph{lib/iconvnls.c} file. + +@* +The following is the description of the locale-related interfaces: + +@itemize @bullet +@item +@code{_iconv_nls_open} - opens two iconv descriptors for "CCS -> +wchar_t" and "wchar_t -> CCS" conversions. The normalized CCS name is +passed in the function parameters. The @emph{wchar_t} characters encoding is +either ucs_2_internal or ucs_4_internal depending on size of +@emph{wchar_t}. + +@item +@code{_iconv_nls_conv} - the function is similar to the @code{iconv} +functions, but if there is no character in the output encoding which +corresponds to the character in the input encoding, the default +conversion isn't performed (the @code{iconv} function sets such output +characters to the @kbd{?} symbol and this is the behavior, which is +specified in SUSv3). + +@item +@code{_iconv_nls_get_state} - returns the current encoding's shift state +(the @code{mbstate_t} object). + +@item +@code{_iconv_nls_set_state} sets the current encoding's shift state (the +@code{mbstate_t} object). + +@item +@code{_iconv_nls_is_stateful} - checks whether the encoding is stateful +or stateless. + +@item +@code{_iconv_nls_get_mb_cur_max} - returns the maximum length (the +maximum bytes number) of the encoding's characters. +@end itemize + + + + +@page +@node Contact +@section Contact +@* +The author of the original BSD iconv library (Alexander Chuguev) no longer +supports that code. + +@* +Any questions regarding the iconv library may be forwarded to +Artem B. Bityuckiy (dedekind@@oktetlabs.ru or dedekind@@mail.ru) as +well as to the public Newlib mailing list. + diff --git a/newlib/libc/iconv/lib/iconvnls.c b/newlib/libc/iconv/lib/iconvnls.c index db88916..1b42f2c 100644 --- a/newlib/libc/iconv/lib/iconvnls.c +++ b/newlib/libc/iconv/lib/iconvnls.c @@ -71,7 +71,7 @@ _DEFUN(_iconv_nls_construct_filename, (rptr, file, ext), int dirlen = strlen (dir); if ((path = _getenv_r (rptr, NLS_ENVVAR_NAME)) == NULL || *path == '\0') - path = NLS_DEFAULT_NLSPATH; + path = ICONV_DEFAULT_NLSPATH; len1 = strlen (path); len2 = strlen (file); diff --git a/newlib/libc/iconv/lib/iconvnls.h b/newlib/libc/iconv/lib/iconvnls.h index d31876b..f6d4866 100644 --- a/newlib/libc/iconv/lib/iconvnls.h +++ b/newlib/libc/iconv/lib/iconvnls.h @@ -1,5 +1,5 @@ -#ifndef __INCOV_ICONVNLS_H__ -#define __INCOV_ICONVNLS_H__ +#ifndef __ICONV_ICONVNLS_H__ +#define __ICONV_ICONVNLS_H__ #include <newlib.h> @@ -33,5 +33,5 @@ # endif #endif /* _MB_CAPABLE */ -#endif /* !__INCOV_ICONVNLS_H__ */ +#endif /* !__ICONV_ICONVNLS_H__ */ diff --git a/newlib/libc/include/sys/iconvnls.h b/newlib/libc/include/sys/iconvnls.h new file mode 100644 index 0000000..09ea183 --- /dev/null +++ b/newlib/libc/include/sys/iconvnls.h @@ -0,0 +1,77 @@ +/* + * Copyright (c) 2003-2004, Artem B. Bityuckiy. + * Rights transferred to Franklin Electronic Publishers. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +/* + * Funtions, macros, etc implimented in iconv library but used by other + * NLS-related subsystems too. + */ +#ifndef __SYS_ICONVNLS_H__ +#define __SYS_ICONVNLS_H__ + +#include <_ansi.h> +#include <reent.h> +#include <wchar.h> +#include <iconv.h> + +/* Iconv data path environment variable name */ +#define NLS_ENVVAR_NAME "NLSPATH" +/* Default NLSPATH value */ +#define ICONV_DEFAULT_NLSPATH "/usr/locale" +/* Direction markers */ +#define ICONV_NLS_FROM 0 +#define ICONV_NLS_TO 1 + +_VOID +_EXFUN(_iconv_nls_get_state, (iconv_t cd, mbstate_t *ps, int direction)); + +int +_EXFUN(_iconv_nls_set_state, (iconv_t cd, mbstate_t *ps, int direction)); + +int +_EXFUN(_iconv_nls_is_stateful, (iconv_t cd, int direction)); + +int +_EXFUN(_iconv_nls_get_mb_cur_max, (iconv_t cd, int direction)); + +size_t +_EXFUN(_iconv_nls_conv, (struct _reent *rptr, iconv_t cd, + _CONST char **inbuf, size_t *inbytesleft, + char **outbuf, size_t *outbytesleft)); + +_CONST char * +_EXFUN(_iconv_nls_construct_filename, (struct _reent *rptr, _CONST char *file, + _CONST char *dir, _CONST char *ext)); + + +int +_EXFUN(_iconv_nls_open, (struct _reent *rptr, _CONST char *encoding, + iconv_t *towc, iconv_t *fromwc, int flag)); + +char * +_EXFUN(_iconv_resolve_encoding_name, (struct _reent *rptr, _CONST char *ca)); + +#endif /* __SYS_ICONVNLS_H__ */ + |