diff options
author | Jeff Johnston <jjohnstn@redhat.com> | 2004-07-07 17:26:38 +0000 |
---|---|---|
committer | Jeff Johnston <jjohnstn@redhat.com> | 2004-07-07 17:26:38 +0000 |
commit | 6edb3da9ac699371c24bf92fa1df0966c68864b3 (patch) | |
tree | 95f5c608af18f4fe3d1ad0138b0c6b698353f5d6 /newlib | |
parent | 578a35608f7bf1737e7b267e73f9410ae8617420 (diff) | |
download | newlib-6edb3da9ac699371c24bf92fa1df0966c68864b3.zip newlib-6edb3da9ac699371c24bf92fa1df0966c68864b3.tar.gz newlib-6edb3da9ac699371c24bf92fa1df0966c68864b3.tar.bz2 |
2004-07-07 Artem B. Bityuckiy <dedekind@oktetlabs.ru>
* libc/iconv/iconv.tex: Updated to represent recent changes.
* libc/iconv/lib/iconv.c: Documentation updated.
Diffstat (limited to 'newlib')
-rw-r--r-- | newlib/ChangeLog | 5 | ||||
-rw-r--r-- | newlib/libc/iconv/iconv.tex | 884 | ||||
-rw-r--r-- | newlib/libc/iconv/lib/iconv.c | 13 |
3 files changed, 853 insertions, 49 deletions
diff --git a/newlib/ChangeLog b/newlib/ChangeLog index c2e8735..4c0d460 100644 --- a/newlib/ChangeLog +++ b/newlib/ChangeLog @@ -1,3 +1,8 @@ +2004-07-07 Artem B. Bityuckiy <dedekind@oktetlabs.ru> + + * libc/iconv/iconv.tex: Updated to represent recent changes. + * libc/iconv/lib/iconv.c: Documentation updated. + 2004-07-07 Nick Clifton <nickc@redhat.com> * configure.host (newlib_cflags): Define PREFER_SIZE_OVER_SPEED diff --git a/newlib/libc/iconv/iconv.tex b/newlib/libc/iconv/iconv.tex index 5542f9b..d8aa8a1 100644 --- a/newlib/libc/iconv/iconv.tex +++ b/newlib/libc/iconv/iconv.tex @@ -1,42 +1,864 @@ @node Iconv -@chapter Character-set conversions (@file{iconv.h}) +@chapter Encoding conversions (@file{iconv.h}) This chapter describes the Newlib iconv library. The iconv functions declarations are in @file{iconv.h}. @menu -* iconv:: Character set conversion routines -* iconv configuration:: Newlib iconv-specific configure options +* iconv:: Encoding conversion routines +* Introduction:: Introduction to iconv and encodings +* Supported encodings:: The list of currently supported encodings +* iconv design decisions:: General iconv library design issues and decisions +* iconv configuration:: iconv-related configure script options @end menu @page @include iconv/iconv.def @page -@node iconv configuration -@section iconv configuration -@findex iconv configuration +@node Introduction +@section Introduction @findex encoding +@findex character set +@findex charset +@findex CES +@findex CCS +@* +The iconv library is intended to convert characters from one encoding to +another. It implements iconv(), iconv_open() and iconv_close() calls +defined by the Single Unix Specification. + +@* +In addition to these user-level interfaces, the iconv library also has +several useful internal interfaces which are needed to support coding +capabilities of the Locale infrastructure. Since Locale also needs to +convert various character sets to and from Wide characters set, iconv +library shares it's capabilities with Locale subsystem. Moreover, iconv +supports several features which are only needed for Locale infrastructure +(for example, the MB_CUR_MAX value). + +@* +The Newlib iconv library was created using ideas of another iconv +library implemented by Konstantin Chuguev (ver 2.0). Thus, the Newlib iconv +library has double Copyright. The Newlib iconv library was rewritten from +scratch by Artem B. Bityuckiy and contains a lot of improvements with respect +to original iconv library. + +@* +Terms like @dfn{encoding} or @dfn{character set} aren't well defined and +are used with various meanings. The following is definitions of terms +used in this documentation as well as in iconv library implementation: + +@itemize @bullet +@item +@dfn{encoding} - a machine representation of characters by means of bits; + +@item +@dfn{Character Set} or @dfn{Charset} - just a collection of +characters, i.e. encoding is a machine representation of character set; + +@item +@dfn{CCS} (@dfn{Coded Character Set}) - a mapping from an character set to a +set of integers @dfn{character codes}; + +@item +@dfn{CES} (@dfn{Character Encoding Scheme}) - a mapping from a set of character +codes to a sequence of bytes; +@end itemize + +@* +Users usually deal with encodings, for example, KOI8-R, Unicode, UTF-8, +ASCII, etc. Encodings are formed by the following chain: + +@enumerate +@item +User has a set of characters specific to his language (character set). + +@item +Each character from this set uniquely numbered, resulting in CCS. + +@item +Each number from CCS is converted to a sequence of bits or bytes by means +of CES resulting in some encoding. Thus, CES may be considered as a +function of CCS which produces some encoding. Note, that CES may be +applied to more than one CCS. +@end enumerate + +@* +Thus, an encoding may be considered as one or more CCS + CES. + +@* +Sometimes, there is no CES and in such cases Encoding is equivalent to CCS, +e.g. KOI8-R or ASCII. + +@* +The example of more complicated encoding is UTF-8 which is the UCS +(or Unicode) CCS plus UTF-8 CES. + +@* +The following is a brief list of iconv library features: +@itemize +@item +Generic architecture +@item +Locale infrastructure support +@item +Automatic generation of code which handles various CES/CCS/Encoding/Names/Aliases +dependencies. +@item +The possibility to choose size- or speed-optimazed configuration +@item +The possibility to exclude almost all unneeded code from linking. +@end itemize + + + + +@page +@node Supported encodings +@section Supported encodings +@findex big5 +@findex cp775 +@findex cp850 +@findex cp852 +@findex cp855 +@findex cp866 +@findex euc_jp +@findex euc_kr +@findex euc_tw +@findex iso_8859_1 +@findex iso_8859_10 +@findex iso_8859_11 +@findex iso_8859_13 +@findex iso_8859_14 +@findex iso_8859_15 +@findex iso_8859_2 +@findex iso_8859_3 +@findex iso_8859_4 +@findex iso_8859_5 +@findex iso_8859_6 +@findex iso_8859_7 +@findex iso_8859_8 +@findex iso_8859_9 +@findex iso_ir_111 +@findex koi8_r +@findex koi8_ru +@findex koi8_u +@findex koi8_uni +@findex ucs_2 +@findex ucs_2_internal +@findex ucs_2be +@findex ucs_2le +@findex ucs_4 +@findex ucs_4_internal +@findex ucs_4be +@findex ucs_4le +@findex us_ascii +@findex utf_16 +@findex utf_16be +@findex utf_16le +@findex utf_8 +@findex win_1250 +@findex win_1251 +@findex win_1252 +@findex win_1253 +@findex win_1254 +@findex win_1255 +@findex win_1256 +@findex win_1257 +@findex win_1258 +@* +The following is a list of currently supported encodings. The first column +corresponds to encoding name, the second to the list of its aliases, third +- to its CES and CCS components names, fourth - to its short description. + +@multitable @columnfractions .20 .26 .24 .30 +@item +Name +@tab +Aliases +@tab +CES/CCS +@tab +Short description +@item +@tab +@tab +@tab + + +@item +big5 +@tab +csbig5, big_five, bigfive, cn_big5, cp950 +@tab +table_pcs / big5, us_ascii +@tab +An encoding for Traditional Chinese. + + +@item +cp775 +@tab +ibm775, cspc775baltic +@tab +table / cp775 +@tab +An updated version of CP 437 that supports balitic languages. + + +@item +cp850 +@tab +ibm850, 850, cspc850multilingual +@tab +table / cp850 +@tab +IBM 850 - an updated version of CP 437 where several Latin 1 characters have been +added instead of some less-often used characters like line-drawing and greek ones. + + +@item +cp852 +@tab +ibm852, 852, cspcp852 +@tab +@tab +IBM 852 - an updated version of CP 437 where several Latin 2 characters have been added +instead of some less-often used characters like line-drawing and greek ones. + + +@item +cp855 +@tab +ibm855, 855, csibm855 +@tab +table / cp855 +@tab +IBM 855 - an updated version of CP 437 that supports Cyrillic. + + +@item +cp866 +@tab +866, IBM866, CSIBM866 +@tab +table / cp866 +@tab +IBM 866 - an updated version of CP 855 which followes the more logical Russian alphabet +ordering of the alternativny variant that is preferred by many Russian users. + + +@item +euc_jp +@tab +eucjp +@tab +euc / jis_x0208_1990, jis_x0201_1976, jis_x0212_1990 +@tab +EUC-JP - The EUC for Japanese. + + +@item +euc_kr +@tab +euckr +@tab +euc / ksx1001 +@tab +EUC-KR - The EUC for Korean. + + +@item +euc_tw +@tab +euctw +@tab +euc / cns11643_plane1, cns11643_plane2, cns11643_plane14 +@tab +EUC-TW - The EUC for Traditional Chinese. + + +@item +iso_8859_1 +@tab +iso8859_1, iso88591, iso_8859_1:1987, iso_ir_100, latin1, l1, ibm819, cp819, csisolatin1 +@tab +table / iso_8859_1 +@tab +ISO 8859-1:1987 - Latin 1, West European. + + +@item +iso_8859_10 +@tab +iso_8859_10:1992, iso_ir_157, iso885910, latin6, l6, csisolatin6, iso8859_10 +@tab +table / iso_8859_10 +@tab +ISO 8859-10:1992 - Latin 6, Nordic. + + +@item +iso_8859_11 +@tab +iso8859_11, iso885911 +@tab +table / iso_8859_11 +@tab +ISO 8859-11 - Thai. + + +@item +iso_8859_13 +@tab +iso_8859_13:1998, iso8859_13, iso885913 +@tab +table / iso_8859_13 +@tab +ISO 8859-13:1998 - Latin 7, Baltic Rim. + + +@item +iso_8859_14 +@tab +iso_8859_14:1998, iso885914, iso8859_14 +@tab +table / iso_8859_14 +@tab +ISO 8859-14:1998 - Latin 8, Celtic. + + +@item +iso_8859_15 +@tab +iso885915, iso_8859_15:1998, iso8859_15, +@tab +table / iso_8859_15 +@tab +ISO 8859-15:1998 - Latin 9, West Europe, successor of Latin 1. + + +@item +iso_8859_2 +@tab +iso8859_2, iso88592, iso_8859_2:1987, iso_ir_101, latin2, l2, csisolatin2 +@tab +table / iso_8859_2 +@tab +ISO 8859-2:1987 - Latin 2, East European. + + +@item +iso_8859_3 +@tab +iso_8859_3:1988, iso_ir_109, iso8859_3, latin3, l3, csisolatin3, iso88593 +@tab +table / iso_8859_3 +@tab +ISO 8859-3:1988 - Latin 3, South European. + + +@item +iso_8859_4 +@tab +iso8859_4, iso88594, iso_8859_4:1988, iso_ir_110, latin4, l4, csisolatin4 +@tab +table / iso_8859_4 +@tab +ISO 8859-4:1988 - Latin 4, North European. + + +@item +iso_8859_5 +@tab +iso8859_5, iso88595, iso_8859_5:1988, iso_ir_144, cyrillic, csisolatincyrillic +@tab +table / iso_8859_5 +@tab +ISO 8859-5:1988 - Cyrillic. + + +@item +iso_8859_6 +@tab +iso_8859_6:1987, iso_ir_127, iso8859_6, ecma_114, asmo_708, arabic, csisolatinarabic, iso88596 +@tab +table / iso_8859_6 +@tab +ISO i8859-6:1987 - Arabic. + + +@item +iso_8859_7 +@tab +iso_8859_7:1987, iso_ir_126, iso8859_7, elot_928, ecma_118, greek, greek8, csisolatingreek, iso88597 +@tab +table / iso_8859_7 +@tab +ISO 8859-7:1987 - Greek. + + +@item +iso_8859_8 +@tab +iso_8859_8:1988, iso_ir_138, iso8859_8, hebrew, csisolatinhebrew, iso88598 +@tab +table / iso_8859_8 +@tab +ISO 8859-8:1988 - Hebrew. + + +@item +iso_8859_9 +@tab +iso_8859_9:1989, iso_ir_148, iso8859_9, latin5, l5, csisolatin5, iso88599 +@tab +table / iso_8859_9 +@tab +ISO 8859-9:1989 - Latin 5, Turkish. + + +@item +iso_ir_111 +@tab +ecma_cyrillic, koi8_e, koi8e, csiso111ecmacyrillic +@tab +table / iso_ir_111 +@tab +ISO IR 111/ECMA Cyrillic. + + +@item +koi8_r +@tab +cskoi8r, koi8r, koi8 +@tab +table / koi8_r +@tab +RFC 1489 Cyrillic. + + +@item +koi8_ru +@tab +koi8ru +@tab +table / koi8_ru +@tab +Obsoleted Ukrainian. + + +@item +koi8_u +@tab +koi8u +@tab +table / koi8_u +@tab +RFC 2319 Ukrainian. + + +@item +koi8_uni +@tab +koi8uni +@tab +table / koi8_uni +@tab +KOI8 Unified. + + +@item +ucs_2 +@tab +ucs2, iso_10646_ucs_2, iso10646_ucs_2, iso_10646_ucs2, iso10646_ucs2, iso10646ucs2, csUnicode +@tab +ucs_2 / (UCS) +@tab +ISO-10646-UCS-2. Big Endian, NBSP is always interpreted as NBSP (BOM isn't supported). + + +@item +ucs_2_internal +@tab +ucs2_internal, ucs_2internal, ucs2internal +@tab +ucs_2_internal / (UCS) +@tab +ISO-10646-UCS-2 in system byte order. +NBSP is always interpreted as NBSP (BOM isn't supported). + + +@item +ucs_2be +@tab +ucs2be +@tab +ucs_2 / (UCS) +@tab +Big Endian version of ISO-10646-UCS-2 (in fact, equivalent to ucs_2). +Big Endian, NBSP is always interpreted as NBSP (BOM isn't supported). + + +@item +ucs_2le +@tab +ucs2le +@tab +ucs_2 / (UCS) +@tab +Little Endian version of ISO-10646-UCS-2. +Little Endian, NBSP is always interpreted as NBSP (BOM isn't supported). + + +@item +ucs_4 +@tab +ucs4, iso_10646_ucs_4, iso10646_ucs_4, iso_10646_ucs4, iso10646_ucs4, iso10646ucs4 +@tab +ucs_4 / (UCS) +@tab +ISO-10646-UCS-4. Big Endian, NBSP is always interpreted as NBSP (BOM isn't supported). + + +@item +ucs_4_internal +@tab +ucs4_internal, ucs_4internal, ucs4internal +@tab +ucs_4_internal / (UCS) +@tab +ISO-10646-UCS-4 in system byte order. +NBSP is always interpreted as NBSP (BOM isn't supported). + + +@item +ucs_4be +@tab +ucs4be +@tab +ucs_4 / (UCS) +@tab +Big Endian version of ISO-10646-UCS-4 (in fact, equivalent to ucs_4). +Big Endian, NBSP is always interpreted as NBSP (BOM isn't supported). + + +@item +ucs_4le +@tab +ucs4le +@tab +ucs_4 / (UCS) +@tab +Little Endian version of ISO-10646-UCS-4. +Little Endian, NBSP is always interpreted as NBSP (BOM isn't supported). + + +@item +us_ascii +@tab +ansi_x3.4_1968, ansi_x3.4_1986, iso_646.irv:1991, ascii, iso646_us, us, ibm367, cp367, csascii +@tab +us_ascii / (ASCII) +@tab +7-bit ASCII. + + +@item +utf_16 +@tab +utf16 +@tab +utf_16 / (UCS) +@tab +RFC 2781 UTF-16. The very first NBSP code in stream is interpreted as BOM. + + +@item +utf_16be +@tab +utf16be +@tab +utf_16 / (UCS) +@tab +Big Endian version of RFC 2781 UTF-16. +NBSP is always interpreted as NBSP (BOM isn't supported). + + +@item +utf_16le +@tab +utf16le +@tab +utf_16 / (UCS) +@tab +Little Endian version of RFC 2781 UTF-16. +NBSP is always interpreted as NBSP (BOM isn't supported). + + +@item +utf_8 +@tab +utf8 +@tab +utf_8 / (UCS) +@tab +RFC 3629 UTF-8. + + +@item +win_1250 +@tab +cp1250 +@tab +@tab +Win-1250 Croatian. + + +@item +win_1251 +@tab +cp1251 +@tab +table / win_1251 +@tab +Win-1251 - Cyrillic. + + +@item +win_1252 +@tab +cp1252 +@tab +table / win_1252 +@tab +Win-1252 - Latin 1. + + +@item +win_1253 +@tab +cp1253 +@tab +table / win_1253 +@tab +Win-1253 - Greek. + + +@item +win_1254 +@tab +cp1254 +@tab +table / win_1254 +@tab +Win-1254 - Turkish. + + +@item +win_1255 +@tab +cp1255 +@tab +table / win_1255 +@tab +Win-1255 - Hebrew. + + +@item +win_1256 +@tab +cp1256 +@tab +table / win_1256 +@tab +Win-1256 - Arabic. + + +@item +win_1257 +@tab +cp1257 +@tab +table / win_1257 +@tab +Win-1257 - Baltic. + + +@item +win_1258 +@tab +cp1258 +@tab +table / win_1258 +@tab +Win-1258 - Vietnamese7 that supports Cyrillic. +@end multitable + + + + +@page +@node iconv design decisions +@section iconv design decisions +@findex CCS table +@findex CES converter +@* +The first iconv library design issue arises when considering the +following two design approaches: + +@enumerate +@item +Have modules which implement conversion from encoding A to encoding B +and vice versa, i.e., one conversion module relates to any two +encodings. +@item +Have modules which implement conversion from encoding A to fixed +encoding C and vice versa, i.e., on conversion module relates to any +one encoding A and one fixed encoding C. In this case, to convert from +encoding A to encoding B, two modules are needed in order to convert +from A to C and then from C to B. +@end enumerate + +@* +It's obvious, that we have a tradeoff between commonness/flexibility and +efficiency: the first method is more efficient since it converts +directly. But from other hand, it isn't so flexible since for each +encoding pair distinct module is needed. + +@* +The Newlib iconv uses the second method and always converts through 32 +bit UCS. But its design also allows to write specialized conversion +modules if the conversion speed is critical. + +@* +The second design issue is how to decompose encodings. +The Newlib iconv library uses the fact that any encoding may be +considered as one or more CCS plus CES. It also decomposes its +conversion modules on @dfn{CES converter} plus one or more @dfn{CCS +tables}. CCS tables maps CCS to UCS and vice versa, CES converters +map CCS to encoding and vice versa. + +@* +As an example, consider conversion from big5 encoding to EUC-TW +encoding. big5 encoding may be decomposed on ASCII and BIG5 CCSes plus +BIG5 CES. EUC-TW may be decomposed on CNS11643_PLANE1, CNS11643_PLANE2, +and CNS11643_PLANE14 CCSes plus EUC CES. + +@* +The euc_jp -> big5 conversion happens as follows: + +@enumerate +@item +EUC converter performs EUC-TW encoding to correspondent CCSes transformation +(CNS11643_PLANE1, CNS11643_PLANE2 and CNS11643_PLANE14 CCSes); +@item +Obtained CCS codes are transformed to UCS codes using CNS11643_PLANE1, +CNS11643_PLANE2 and CNS11643_PLANE14 CCS tables; +@item +Resulting UCS codes are transformed to ASCII and BIG5 codes using +correspondent CCS tables; +@item +Obtained CCS codes are transformed to big5 encoding using correspondent +CES converter. +@end enumerate + +@* +Analogously, the backward conversion is performed as follows: + +@enumerate +@item +BIG converter performs big5 encoding -> correspondent CCSes transformation +(ASCII and BIG5 CCSes); +@item +Obtained CCS codes are transformed to UCS codes using ASCII and BIG5 CCS tables; +@item +Resulting UCS codes are transformed to ASCII and BIG5 codes using +correspondent CCS tables; +@item +Obtained CCS codes are transformed to EUC-TW encoding using correspondent +CES converter. +@end enumerate + +@* +Note, the above is just an example and real names (implemented in Newlib +iconv) of CES converters and CCS tables are slightly different. + +@* +The third design issue also relates to flexibility. Obviously, it isn't +wanted to always link all CES converters and CCS tables to the library +but instead, it is wanted to be able to load needed converters and tables +dynamically on demand. This isn't a problem on "big" machines like PC +but may be very problematical within "small" embedded systems. + @* -To enable iconv, the --enable-newlib-iconv configuration option should be -used when configuring Newlib. +Since the CCS tables are just data, it is possible to load them +dynamically from external files. Instead, CES converters are algorithms +and contain some code and the dynamic library loading capability is needed. @* -Iconv library is intended to perform conversions from one encoding to -another encoding. Thus, the only user-visible abstraction is encoding. -To enable particular encoding support user should enable it using -Newlib's configure script options. Encoding's support is divided into -two parts: "to" and "from". For example, if it is only wanted to have -UTF-8 -> UCS-4 coding capabilities, "from" UTF-8 and "to" UCS-4 support -should be enabled. In this case backward (UCS-4 -> UTF-8) conversion -won't be possible (iconv_open will return error). Such division on "to" -and "from" parts helps to save memory. +Apart from possible restrictions applied by embedded systems (too few +RAM for example), the Newlib itself has no dynamic libraries support and, +therefore, all CES converters which will ever be uses must be linked into +the library. But the dynamic CCS tables loading is possible and is +implemented in the Newlib iconv library and may be enabled via Newlib +configure script options. @* +The next design decision is the possibility to of fine iconv library +configuring. This means, that iconv isn't always link all it's +converters and tables (if no dynamical loading enabled) but instead, it +gives the possibility to enable only those encodings which are planned +to be used (see section about configure script options). + +@* +Moreover, the Newlib iconv library configure options distinguish between +coding directions. This means, that not only supported encodings are +selectable, but the coding direction too. For example, if user wants +configuration which allows conversions from UTF-8 to UTF-16 and he +doesn't plan to use UTF-16 to UTF-8 conversions, he can enable exactly +that conversion direction (i.e., no UTF-16 -> UTF-8 -related code will +be included) thus saving some memory (note, that such technique allows to +exclude one half of CCS table from linking which may be big enough). + +@* +One more design decision is speed- and size- optimized tables. Used can +select between them using s configure script option. Speed CCS tables +are the same as Size ones in case of 8 bit CCS (e.g.m KOI8-R), but for 16 +bit CCS Size-optimized table may be in 1.5-2 time less then +Speed-optimized ones. From the other hand, the conversion with speed +tables is in several times faster. + +@* +Its worth to stress, that new encodings support can't be +dynamically added into already compiled Newlib library. Even if this +needs only additional CCS table and iconv is configured to use external +files with CCS tables (this isn't a fundamental restriction and the +possibility to add new Table-based encodings support dynamically, by +copying new .cct file, may be easily added). + +@* +Theoretically, the compiled-in CCS tables may be more appropriate foe +embedded solutions since they are read-only and are placed to ROM, +whereas the dynamic loading needs more RAM. Moreover, in current +implementation, distinct copy of CCS file is loaded for each fore each +opened iconv descriptor even in case of the same encoding. +This means, for example, that if two iconv descriptors for +KOI8-R -> UCS-4BE and KOI8-R -> UTF-16BE are opened, two copies of +koi8-r .cct file will be loaded (actually, iconv loads only needed part +of these files). + + +@page +@node iconv configuration +@section iconv configuration +@findex iconv configuration +@* To enable encoding support --enable-newlib-iconv-encodings configure script option should be used. This option accepts a comma-separated list -of encodins that should be enabled. Option enables each encoding in both +of encodings that should be enabled. Option enables each encoding in both ("to" and "from") directions. @* @@ -56,30 +878,8 @@ code and data will be linked) is to configure Newlib with --enable-newlib-iconv-to-encodings=KOI8-R,ISO-8859-5 @* -There is one more configue script option for iconv library: ---enable-newlib-iconv-external-ccs. This options enables iconv's -capabilities to work with external CCS files. Exteral CCS files are just -conversion tables used by iconv. Without this option all conversion -tables are linked-in and occupy a lot of ROM. If target system has -some fyle-system, it can benefit using external CCS files which are -loaded on iconv_open and unloaded on iconv_close. But this way require -more RAM. Moreover, in current implementation, distinct copy of CCS file -is loaded for each fore each opended iconv decriptor for the same -encoding. This means that if, for example, two iconv descriptors for -KOI8-R -> UCS-4BE and KOI8-R -> UTF-16BE are opened, two copies of -koi8-r.cct file will be loaded. - -@* -Note: not evry encoding needs CCS tiles. For example, UTF-8, UTF-16, -UCS-2, UCS-4 doesn't use such files at all. - -@* -Note: CCS file contains a number of tables, and only several needed tables -are loaded from this file. This means, that there is a possibility to save -some "fyle-system space" not including unneeded tables to that CCS -files. Such task may be performed using "mktbl.pl" Perl script -destributed with iconv library. +--enable-newlib-iconv-external-ccs option enables iconv's +capabilities to work with external CCS files. @* Note: CCS files are searched by iconv_open in $NLSPATH/iconv_data/ directory. - diff --git a/newlib/libc/iconv/lib/iconv.c b/newlib/libc/iconv/lib/iconv.c index 4e265f5..ee7124b 100644 --- a/newlib/libc/iconv/lib/iconv.c +++ b/newlib/libc/iconv/lib/iconv.c @@ -97,18 +97,18 @@ TRAD_SYNOPSIS DESCRIPTION The function <<iconv>> converts characters from <[in]> which are in one -character set and converts them to characters of another character set, -outputting them to <[out]>. The value <[inleft]> specifies the number -of input bytes to convert whereas the value <[outleft]> specifies the -size remaining in the <[out]> buffer. The conversion descriptor <[cd]> -specifies the conversion being performed and is created via <<iconv_open>>. +encoding to characters of another encoding, outputting them to <[out]>. +The value <[inleft]> specifies the number of input bytes to convert whereas +the value <[outleft]> specifies the size remaining in the <[out]> buffer. +The conversion descriptor <[cd]> specifies the conversion being performed +and is created via <<iconv_open>>. An <<iconv>> conversion stops if: the input bytes are exhausted, the output buffer is full, an invalid input character sequence occurs, or the conversion specifier is invalid. The function <<iconv_open>> is used to specify a conversion from one -character set: <[from]> to another: <[to]>. The result of the call is +encoding: <[from]> to another: <[to]>. The result of the call is to create a conversion specifier that can be used with <<iconv>>. The function <<iconv_close>> is used to close a conversion specifier after @@ -346,4 +346,3 @@ _DEFUN(_iconv_close_r, (rptr, cd), return res; } #endif /* !_REENT_ONLY */ - |