aboutsummaryrefslogtreecommitdiff
path: root/newlib/libc
diff options
context:
space:
mode:
Diffstat (limited to 'newlib/libc')
-rw-r--r--newlib/libc/iconv/iconv.tex424
1 files changed, 421 insertions, 3 deletions
diff --git a/newlib/libc/iconv/iconv.tex b/newlib/libc/iconv/iconv.tex
index 3d3621e..9d524ce 100644
--- a/newlib/libc/iconv/iconv.tex
+++ b/newlib/libc/iconv/iconv.tex
@@ -1,14 +1,432 @@
@node Iconv
@chapter Character-set conversions (@file{iconv.h})
-This chapter describes the iconv character-set conversion functions.
-The corresponding declarations are in
+This chapter describes the Newlib iconv library.
+The iconv functions declarations are in
@file{iconv.h}.
@menu
-* iconv:: Character set conversion routines
+* iconv:: Character set conversion routines
+* iconv architecture:: Architecture of Newlib iconv library
+* iconv configuration:: Newlib iconv-specific configure options
+* Generating CCS tables:: How to generate CCS tables
+* Adding new converter:: Steps on adding a new converter
@end menu
@page
@include iconv/iconv.def
+@page
+@node iconv architecture
+@section iconv architecture
+@findex iconv architecture
+@findex encoding
+@findex CCS
+@findex CES
+@findex iconv converter
+@*
+@itemize @bullet
+@item
+Encoding - a rule to represent computer text by means of bits and bytes.
+@item
+CCS (Coded Character Set) - a mapping from an abstract character set
+to a set of non-negative integers (character codes).
+@item
+CES (Character Encoding Scheme) - a mapping from a set of character codes
+units to a sequence of bytes.
+@end itemize
+
+@*
+Examples of CCS: ASCII, ISO-8859-x, KOI8-R, KSX-1001, GB-2312.@*
+Examples of CES: UTF-8, UTF-16, EUC-JP, ISO-2022-JP.
+
+@*
+The iconv library is used to convert an array of characters in one encoding
+to array in another encoding.
+
+@*
+From a user's point of view, the iconv library is a set of converters. Each converter
+corresponds to one encoding (e.g., KOI8-R converter, UTF-8 converter).
+Internally the meaning of converter is different.
+
+@*
+The iconv library always performs conversions through UCS-32: i.e., to convert
+from A to B, iconv library first converts A to UCS-32, and then USC-32 to B.
+
+@*
+Each encoding consists of CES and CCS. CCS may be represented as data tables
+but CES always implies some code (algorithm). Iconv uses CCS tables
+to map from some encoding to UCS-32. CCS tables are placed into
+the iconv/ccs subdirectory of newlib. The iconv code also uses CES
+modules which can convert some CCS to and from UCS-32. CES modules are placed
+in the iconv/ces subdirectory.
+
+@*
+Some encodings have CES = CCS (e.g., KOI8-R). For such encodings iconv uses
+special subroutines which perform simple table conversions (ccs_table.c).
+
+@*
+Among specialized CES modules, the iconv library has
+generic support for EUC and ISO-2022-family encodings (ces_euc.c and
+ces_iso2022.c).
+
+@*
+To enable iconv to work with CCS or CES-based encodings, the correspondent
+CES table or CCS module should be linked with Newlib. The iconv support
+can also load CCS tables dynamically from external files (.cct files from
+iconv/ccs/binary subdirectory). CES modules, on the other-hand, can't
+be dynamically loaded.
+
+@*
+Each iconv converter has one name and a set of aliases. The list of
+aliases for each converter's name is in the iconv/charset.aliases file.
+Note: iconv always normalizes converter names and aliases before using.
+
+@page
+@node iconv configuration
+@section iconv configuration
+@findex iconv configuration
+@findex iconv converter
+@*
+To enable iconv, the --enable-newlib-iconv configuration option should be
+used when configuring newlib.
+
+@*
+To link a specific converter (CCS table or CES module) into Newlib, the
+---enable-newlib-builtin-converters option should be used. A
+comma-separated list of converters can be passed with this option
+(e.g., ---enable-newlib-builtin-converters=koi8-r,euc-jp to link KOI8-R
+and EUC-JP converters). Either converter names or aliases may be used.
+
+@*
+If the target system has a file system accessible by Newlib, table-based
+converters may be loaded dynamically from external files. The iconv
+code tries to load files from the iconv_data subdirectory of the directory
+specified by the NLSPATH environment variable.
+
+@*
+Since Newlib has no generic dynamic module load support, CES-based converters
+can't be dynamically loaded and should be linked-in.
+
+@page
+@node Generating CCS tables
+@section Generating CCS tables
+@*
+CCS tables are placed in the ccs subdirectory of the iconv directory.
+This subdirectory contains .cct and .c files. The .cct files are for
+dynamic loading whereas the .c files are for static linking with Newlib.
+Both .c and .cct files are generated by the 'iconv_mktbl' perl script
+from special source files (call them
+.txt files). The 'iconv_mktbl' script can be found in the iconv/ccs
+subdirectory. Input .txt files can be found at the Unicode.org site or
+other locations found on the web.
+
+@*
+The .c files are linked with Newlib if the correspondent 'configure' script
+option was given. This is needed to use iconv on targets without file system
+support. If a CCS table isn't configured to be linked, the iconv library
+tries to load it dynamically from a corresponding .cct file.
+
+@*
+The following are commands to build .c and .cct CCS table files from .txt
+files for several supported encodings.
+
+@*
+@itemize
+@item
+cp775:@*
+iconv_mktbl -Co cp775.c cp775.txt@*
+iconv_mktbl -o cp775.cct cp775.txt
+@end itemize
+
+@itemize
+@item
+cp850:@*
+iconv_mktbl -Co cp850.c cp850.txt@*
+iconv_mktbl -o cp850.cct cp850.txt
+@end itemize
+
+@itemize
+@item
+cp852:@*
+iconv_mktbl -Co cp852.c cp852.txt@*
+iconv_mktbl -o cp852.cct cp852.txt
+@end itemize
+
+@itemize
+@item
+cp855:@*
+iconv_mktbl -Co cp855.c cp855.txt@*
+iconv_mktbl -o cp855.cct cp855.txt
+@end itemize
+
+@itemize
+@item
+cp866@*
+iconv_mktbl -Co cp866.c cp866.txt@*
+iconv_mktbl -o cp866.cct cp866.txt
+@end itemize
+
+@itemize
+@item
+iso-8859-1@*
+iconv_mktbl -Co iso-8859-1.c iso-8859-1.txt@*
+iconv_mktbl -o iso-8859-1.cct iso-8859-1.txt
+@end itemize
+
+@itemize
+@item
+iso-8859-4@*
+iconv_mktbl -Co iso-8859-4.c iso-8859-4.txt@*
+iconv_mktbl -o iso-8859-4.cct iso-8859-4.txt
+@end itemize
+
+@itemize
+@item
+iso-8859-5@*
+iconv_mktbl -Co iso-8859-5.c iso-8859-5.txt@*
+iconv_mktbl -o iso-8859-5.cct iso-8859-5.txt
+@end itemize
+
+@itemize
+@item
+iso-8859-2@*
+iconv_mktbl -Co iso-8859-2.c iso-8859-2.txt@*
+iconv_mktbl -o iso-8859-2.cct iso-8859-2.txt
+@end itemize
+
+@itemize
+@item
+iso-8859-15@*
+iconv_mktbl -Co iso-8859-15.c iso-8859-15.txt@*
+iconv_mktbl -o iso-8859-15.cct iso-8859-15.txt
+@end itemize
+
+@itemize
+@item
+big5@*
+iconv_mktbl -Co big5.c big5.txt@*
+iconv_mktbl -o big5.cct big5.txt
+@end itemize
+
+@itemize
+@item
+ksx1001@*
+iconv_mktbl -Co ksx1001.c ksx1001.txt@*
+iconv_mktbl -o ksx1001.cct ksx1001.txt
+@end itemize
+
+@itemize
+@item
+gb_2312@*
+iconv_mktbl -Co gb_2312-80.c gb_2312-80.txt@*
+iconv_mktbl -o gb_2312-80.cct gb_2312-80.txt
+@end itemize
+
+@itemize
+@item
+jis_x0201@*
+iconv_mktbl -Co jis_x0201.c jis_x0201.txt@*
+iconv_mktbl -o jis_x0201.cct jis_x0201.txt
+@end itemize
+
+@itemize
+@item
+iconv_mktbl -Co shift_jis.c shift_jis.txt@*
+iconv_mktbl -o shift_jis.cct shift_jis.txt
+@end itemize
+
+@itemize
+@item
+jis_x0208@*
+iconv_mktbl -C -c 1 -u 2 -o jis_x0208-1983.c jis_x0208-1983.txt@*
+iconv_mktbl -c 1 -u 2 -o jis_x0208-1983.cct jis_x0208-1983.txt
+@end itemize
+
+@itemize
+@item
+jis_x0212@*
+iconv_mktbl -Co jis_x0212-1990.c jis_x0212-1990.txt@*
+iconv_mktbl -o jis_x0212-1990.cct jis_x0212-1990.txt
+@end itemize
+
+@itemize
+@item
+cns11643-plane1@*
+iconv_mktbl -C -p 0x1 -o cns11643-plane1.c cns11643.txt@*
+iconv_mktbl -p 0x1 -o cns11643-plane1.cct cns11643.txt
+@end itemize
+
+@itemize
+@item
+cns11643-plane2@*
+iconv_mktbl -C -p 0x2 -o cns11643-plane2.c cns11643.txt@*
+iconv_mktbl -p 0x2 -o cns11643-plane2.cct cns11643.txt
+@end itemize
+
+@itemize
+@item
+cns11643-plane14@*
+iconv_mktbl -C -p 0xE -o cns11643-plane14.c cns11643.txt@*
+iconv_mktbl -p 0xE -o cns11643-plane14.cct cns11643.txt
+@end itemize
+
+@itemize
+@item
+koi8-r@*
+iconv_mktbl -Co koi8-r.c koi8-r.txt@*
+iconv_mktbl -o koi8-r.cct koi8-r.txt
+@end itemize
+
+@itemize
+@item
+koi8-u@*
+iconv_mktbl -Co koi8-u.c koi8-u.txt@*
+iconv_mktbl -o koi8-u.cct koi8-u.txt
+@end itemize
+
+@itemize
+@item
+us-ascii@*
+iconv_mktbl -Cao us-ascii.c iso-8859-1.txt@*
+iconv_mktbl -ao us-ascii.cct iso-8859-1.txt
+@end itemize
+
+@*
+Source files for CCS tables can be taken from at least two places:
+
+@*
+@enumerate
+@item
+http://www.unicode.org/Public/MAPPINGS/ contains a lot of encoding
+map files.
+@item
+http://www.dante.net/staff/konstantin/FreeBSD/iconv/ contains original
+iconv sources and encoding map files.
+@end enumerate
+
+@*
+The following are URLs where source files for some of the CCS tables
+are found:
+
+@itemize
+@item
+big5:@*
+http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT
+@end itemize
+
+@itemize
+@item
+cns11643_plane14, cns11643_plane1 and cns11643_plane2:@*
+http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/CNS11643.TXT
+@end itemize
+
+@itemize
+@item
+cp775, cp850, cp852, cp855, cp866:@*
+http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/
+@end itemize
+
+@itemize
+@item
+gb_2312_80:@*
+http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/GB/GB2312.TXT
+@end itemize
+
+@itemize
+@item
+iso_8859_15, iso_8859_1, iso_8859_2, iso_8859_4, iso_8859_5:@*
+http://www.unicode.org/Public/MAPPINGS/ISO8859/
+@end itemize
+
+@itemize
+@item
+jis_x0201, jis_x0208_1983, jis_x0212_1990, shift_jis@*
+http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/JIS0201.TXT
+@end itemize
+
+@itemize
+@item
+koi8_r@*
+http://www.unicode.org/Public/MAPPINGS/VENDORS/MISC/KOI8-R.TXT
+@end itemize
+
+@itemize
+@item
+ksx1001@*
+http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/KSC/KSX1001.TXT
+@end itemize
+
+@itemize
+@item
+koi8-u can be given from original FreeBSD iconv library distribution
+http://www.dante.net/staff/konstantin/FreeBSD/iconv/
+@end itemize
+
+@*
+Moreover, http://www.dante.net/staff/konstantin/FreeBSD/iconv/ contains a
+lot of additional CCS tables that you can use with Newlib (iso-2022 and
+RFC1345 encodings).
+
+@page
+@node Adding new converter
+@section Adding a new iconv converter
+@*
+The following steps should be taken to add a new iconv converter:
+
+@*
+@enumerate
+@item
+Converter's name and aliases list should be added to
+the iconv/charset.aliases file
+@item
+All iconv converters are protected by a _ICONV_CONVERTER_XXX
+macro, where XXX is converter name. This protection macro should be added to
+newlib/newlib.hin file.
+@item
+Converter's name and aliases should be also registered in _iconv_builtin_aliases
+table in iconv/lib/bialiasesi.c. The list should be protected by
+the corresponding macro mentioned above.
+@item
+If a new converter is just a CCS table, the corresponding .cct and .c files
+should be added to the iconv/ccs/ subdirectory. The name of the files
+should be equivalent to the normalized encoding name. The 'iconv_mktbl'
+Perl script (found in iconv/ccs) may
+be used to generate such files. The file's name should be added to
+iconv/ccs/Makefile.am and iconv/ccs/binary/Makefile.am files and then
+automake should be used to regenerate the Makefile.in files.
+@item
+If a new converter has a CES algorithm, the appropriate file should be
+added to the
+iconv/ces/ subdirectory. The name of the file again should be equivalent
+to the normalized
+encoding name.
+@item
+If a converter is EUC or ISO-2022-family CES, then the converter
+is just an array with a list of used CCS (See ccs/euc-jp.c for example). This
+is because iconv already has EUC and ISO-2022 support. Used CCS tables should
+be provided in iconv/ccs/.
+@item
+If a converter isn't EUC or ISO-2022-based CCS, the following two functions
+should be provided (see utf-8.c for example):
+@enumerate @minus
+@item A function to convert from new CES to UCS-32;
+@item A function to convert from UCS-32 to new CES;
+@item An 'init' function;
+@item A 'close' function;
+@item A 'reset' function to reset shift state for stateful CES.
+@end enumerate
+
+@*
+All these functions are registered into a 'struct iconv_ces_desc' object.
+The name of the object should be _iconv_ces_module_XXX, where XXX is the
+name of the converter.
+@item
+For CES converters the correspondent 'struct iconv_ces_desc' reference should
+be added into iconv/lib/bices.c file.
+
+@*
+For CCS converters, the corresponding table reference should be added into
+the iconv/lib/biccs.c file.
+@end enumerate
+