aboutsummaryrefslogtreecommitdiff
path: root/manual
diff options
context:
space:
mode:
Diffstat (limited to 'manual')
-rw-r--r--manual/libc.texinfo8
-rw-r--r--manual/locale.texi2
-rw-r--r--manual/message.texi1185
-rw-r--r--manual/search.texi2
-rw-r--r--manual/startup.texi85
-rw-r--r--manual/string.texi4
6 files changed, 1276 insertions, 10 deletions
diff --git a/manual/libc.texinfo b/manual/libc.texinfo
index cb1769f..6a936fd 100644
--- a/manual/libc.texinfo
+++ b/manual/libc.texinfo
@@ -122,6 +122,8 @@ of the GNU C Library.
* Extended Characters:: Support for extended character sets.
* Locales:: The country and language can affect
the behavior of library functions.
+* Message Translation:: How to make the program speak the users
+ language.
* Searching and Sorting:: General searching and sorting functions.
* Pattern Matching:: Matching wildcards and regular expressions,
and shell-style ``word expansion''.
@@ -314,6 +316,11 @@ Locales and Internationalization
* Standard Locales:: Locale names available on all systems.
* Numeric Formatting:: How to format numbers for the chosen locale.
+Message Translation
+
+* Message catalogs a la X/Open:: The @code{catgets} family of functions.
+* The Uniforum approach:: The @code{gettext} family of functions.
+
Searching and Sorting
* Comparison Functions:: Defining how to compare two objects.
@@ -975,6 +982,7 @@ Porting the GNU C Library
@include time.texi
@include mbyte.texi
@include locale.texi
+@include message.texi
@include setjmp.texi
@include signal.texi
@include startup.texi
diff --git a/manual/locale.texi b/manual/locale.texi
index 1866c66..dfc9117 100644
--- a/manual/locale.texi
+++ b/manual/locale.texi
@@ -1,4 +1,4 @@
-@node Locales, Searching and Sorting, Extended Characters, Top
+@node Locales, Message Translation, Extended Characters, Top
@chapter Locales and Internationalization
Different countries and cultures have varying conventions for how to
diff --git a/manual/message.texi b/manual/message.texi
new file mode 100644
index 0000000..7640e21
--- /dev/null
+++ b/manual/message.texi
@@ -0,0 +1,1185 @@
+@node Message Translation
+@chapter Message Translation
+
+The program's interface with the human should be designed in a way to
+ease the human the task. One of the possibilities is to use messages in
+whatever language the user prefers.
+
+Printing messages in different languages can be implemented in different
+ways. One could add all the different languages in the source code and
+add among the variants every time a message has to be printed. This is
+certainly no good solution since extending the set of languages is
+difficult (the code must be changed) and the code itself can become
+really big with dozens of message sets.
+
+A better solution is to keep the message sets for each language are kept
+in separate files which are loaded at runtime depending on the language
+selection of the user.
+
+The GNU C Library provides two different sets of functions to support
+message translation. The problem is that neither of the interfaces is
+officially defined by the POSIX standard. The @code{catgets} family of
+functions is defined in the X/Open standard but this is drived from
+industry decisions and therefore not necessarily is based on reasinable
+decisions.
+
+As mentioned above the message catalog handling provides easy
+extendibility by using external data files which contain the message
+translations. I.e., these files contain for each of the messages used
+in the program a translation for the appropriate language. So the tasks
+of the message handling functions functions are
+
+@itemize @bullet
+@item
+locate the external data file with the appropriate translations.
+@item
+load the data and make it possible to address the messages
+@item
+map a given key to the translated message
+@end itemize
+
+The two approaches mainly differ in the implementation of this last
+step. The design decisions made for this influences the whole rest.
+
+@menu
+* Message catalogs a la X/Open:: The @code{catgets} family of functions.
+* The Uniforum approach:: The @code{gettext} family of functions.
+@end menu
+
+
+@node Message catalogs a la X/Open
+@section X/Open Message Catalog Handling
+
+The @code{catgets} functions are based on the simple scheme:
+
+@quotation
+Associate every message to translate in the source code with a unique
+identifier. To retrieve a message from a catalog file solely the
+identifier is used.
+@end quotation
+
+This means for the author of the program that s/he will have to make
+sure the meaning of the identifier in the program code and in the
+message catalogs are always the same.
+
+Before a message can be translated the catalog file must be located.
+The user of the program must be able to guide the responsible function
+to find whatever catalog the user wants. This is separated from what
+the programmer had in mind.
+
+All the types, constants and funtions for the @code{catgets} functions
+are defined/declared in the @file{nl_types.h} header file.
+
+@menu
+* The catgets Functions:: The @code{catgets} function family.
+* The message catalog files:: Format of the message catalog files.
+* The gencat program:: How to generate message catalogs files which
+ can be used by the functions.
+* Common Usage:: How to use the @code{catgets} interface.
+@end menu
+
+
+@node The catgets Functions
+@subsection The @code{catgets} function family
+
+@comment nl_types.h
+@comment X/Open
+@deftypefun nl_catd catopen (const char *@var{cat_name}, int @var{flag})
+The @code{catgets} function tries to locate the message data file names
+@var{cat_name} and loads it when found. The return value is of an
+opaque type and can be used in calls to the other functions to refer to
+this loaded catalog.
+
+The return value is @code{(nl_catd) -1} in case the function failed and
+no catalog was loaded. The global variable @var{errno} contains a code
+for the error causing the failure. But even if the function call
+succeeded this does not mean that all messages can be translated.
+
+Locating the catalog file must happen in a way which lets the user of
+the program influence the decision. It is up to the user to decide
+about the language to use and sometimes it is useful to use alternate
+catalog files. All this can be specified by the user by setting some
+enviroment variables.
+
+The first problem is to find out where all the message catalogs are
+stored. Every program could have its own place to keep all the
+different files but usually the catalog files are grouped by languages
+and the catalogs for all programs are kept in the same place.
+
+@cindex NLSPATH environment variable
+To tell the @code{catopen} function where the catalog for the program
+can be found the user can set the environment variable @code{NLSPATH} to
+a value which describes her/his choice. Since this value must be usable
+for different languages and locales it cannot be a simple string.
+Instead it is a format string (similar to @code{printf}'s). An example
+is
+
+@smallexample
+/usr/share/locale/%L/%N:/usr/share/locale/%L/LC_MESSAGES/%N
+@end smallexample
+
+First one can see that more than one directory can be specified (with
+the usual syntax of separating them by colons). The next things to
+observe are the format string, @code{%L} and @code{%N} in this case.
+The @code{catopen} function knows about several of them and the
+replacement for all of them is of course different.
+
+@table @code
+@item %N
+This format element is substituted with the name of the catalog file.
+This is the value of the @var{cat_name} argument given to
+@code{catgets}.
+
+@item %L
+This format element is substituted with the name of the currently
+selected locale for translating messages. How this is determined is
+explained below.
+
+@item %l
+(This is the lowercase ell.) This format element is substituted with the
+language element of the locale name. The string decsribing the selected
+locale is expected to have the form
+@code{@var{lang}[_@var{terr}[.@var{codeset}]]} and this format uses the
+first part @var{lang}.
+
+@item %t
+This format element is substituted by the territory part @var{terr} of
+the name of the currently selected locale. See the explanation of the
+format above.
+
+@item %c
+This format element is substituted by the codeset part @var{codeset} of
+the name of the currently selected locale. See the explanation of the
+format above.
+
+@item %%
+Since @code{%} is used in a meta character there must be a way to
+express the @code{%} character in the result itself. Using @code{%%}
+does this just like it works for @code{printf}.
+@end table
+
+
+Using @code{NLSPATH} allows to specify arbitrary directories to be
+searched for message catalogs while still allowing different languages
+to be used. If the @code{NLSPATH} environment variable is not set the
+default value is
+
+@smallexample
+@var{prefix}/share/locale/%L/%N:@var{prefix}/share/locale/%L/LC_MESSAGES/%N
+@end smallexample
+
+@noindent
+where @var{prefix} is given to @code{configure} while installing the GNU
+C Library (this value is in many cases @code{/usr} or the empty string).
+
+The remaining problem is to decide which must be used. The value
+decides about the substitution of the format elements mentioned above.
+First of all the user can specify a path in the message catalog name
+(i.e., the name contains a slash character). In this situation the
+@code{NLSPATH} environment variable is not used. The catalog must exist
+as specified in the program, perhaps relative to the current working
+directory. This situation in not desirable and catalogs names never
+should be written this way. Beside this, this behaviour is not portable
+to all other platforms providing the @code{catgets} interface.
+
+@cindex LC_ALL environment variable
+@cindex LC_MESSAGES environment variable
+@cindex LANG environment variable
+Otherwise the values of environment variables from the standard
+environemtn are examined (@pxref{Standard Environment}). Which
+variables are examined is decided by the @var{flag} parameter of
+@code{catopen}. If the value is @code{NL_CAT_LOCALE} (which is defined
+in @file{nl_types.h}) then the @code{catopen} function examines the
+environment variable @code{LC_ALL}, @code{LC_MESSAGES}, and @code{LANG}
+in this order. The first variable which is set in the current
+environment will be used.
+
+If @var{flag} is zero only the @code{LANG} environment variable is
+examined. This is a left-over from the early days of this function
+where the other environment variable were not known.
+
+In any case the environment variable should have a value of the form
+@code{@var{lang}[_@var{terr}[.@var{codeset}]]} as explained above. If
+no environment variable is set the @code{"C"} locale is used which
+prevents any translation.
+
+The return value of the function is in any case a valid string. Either
+it is a translation from a message catalog or it is the same as the
+@var{string} parameter. So a piece of code to decide whether a
+translation actually happened must look like this:
+
+@smallexample
+@{
+ char *trans = catgets (desc, set, msg, input_string);
+ if (trans == input_string)
+ @{
+ /* Something went wrong. */
+ @}
+@}
+@end smallexample
+
+@noindent
+When an error occured the global variable @var{errno} is set to
+
+@table @var
+@item EBADF
+The catalog does not exist.
+@item ENOMSG
+The set/message touple does not name an existing element in the
+message catalog.
+@end table
+
+While it sometimes can be useful to test for errors programs normally
+will avoid any test. If the translation is not available it is no big
+problem if the original, untranslated message is printed. Either the
+user understands this as well or s/he will look for the reason why the
+messages are not translated.
+@end deftypefun
+
+Please note that the currently selected locale does not depend on a call
+to the @code{setlocale} function. It is not necessary that the locale
+data files for this locale exist and calling @code{setlocale} succeeds.
+The @code{catopen} function directly reads the values of the environment
+variables.
+
+
+@deftypefun {char *} catgets (nl_catd @var{catalog_desc}, int @var{set}, int @var{message}, const char *@var{string})
+The function @code{catgets} has to be used to access the massage catalog
+previously opened using the @code{catopen} function. The
+@var{catalog_desc} parameter must be a value previously returned by
+@code{catopen}.
+
+The next two parameters, @var{set} and @var{message}, reflect the
+internal organization of the message catalog files. This will be
+explained in detail below. For now it is interesting to know that a
+catalog can consists of several set and the messages in each thread are
+individually numbered using numbers. Neither the set number nor the
+message number must be consecutive. They can be arbitrarily chosen.
+But each message (unless equal to another one) must have its own unique
+pair of set and message number.
+
+Since it is not guaranteed that the message catalog for the language
+selected by the user exists the last parameter @var{string} helps to
+handle this case gracefully. If no matching string can be found
+@var{string} is returned. This means for the programmer that
+
+@itemize @bullet
+@item
+the @var{string} parameters should contain reasonable text (this also
+helps to understand the program seems otherwise there would be no hint
+on the string which is expected to be returned.
+@item
+all @var{string} arguments should be written in the same language.
+@end itemize
+@end deftypefun
+
+It is somewhat uncomfortable to write a program using the @code{catgets}
+functions if no supporting functionality is available. Since each
+set/message number touple must be unique the programmer must keep lists
+of the messages at the same time the code is written. And the work
+between several people working on the same project must be coordinated.
+In @ref{Common Usage} we will see some how these problems can be relaxed
+a bit.
+
+@deftypefun int catclose (nl_catd @var{catalog_desc})
+The @code{catclose} function can be used to free the resources
+associated with a message catalog which previously was opened by a call
+to @code{catopen}. If the resources can be successfully freed the
+function returns @code{0}. Otherwise it return @code{@minus{}1} and the
+global variable @var{errno} is set. Errors can occur if the catalog
+descriptor @var{catalog_desc} is not valid in which case @var{errno} is
+set to @code{EBADF}.
+@end deftypefun
+
+
+@node The message catalog files
+@subsection Format of the message catalog files
+
+The only reasonable way the translate all the messages of a function and
+store the result in a message catalog file which can be read by the
+@code{catopen} function is to write all the message text to the
+translator and let her/him translate them all. I.e., we must have a
+file with entries which associate the set/message touple with a specific
+translation. This file format is specified in the X/Open standard and
+is as follows:
+
+@itemize @bullet
+@item
+Lines containing only whitespace characters or empty lines are ignored.
+
+@item
+Lines which contain as the first non-whitespace character a @code{$}
+followed by a whitespace character are comment and are also ignored.
+
+@item
+If a line contains as the first non-whitespace characters the sequence
+@code{$set} followed by a whitespace character an additional argument
+is required to follow. This argument can either be:
+
+@itemize @minus
+@item
+a number. In this case the value of this number determines the set
+to which the following messages are added.
+
+@item
+an identifier consisting of alphanumeric characters plus the underscore
+character. In this case the set get automatically a number assigned.
+This value is one added to the largest set number which so far appeared.
+
+How to use the symbolic names is explained in section @ref{Common Usage}.
+
+It is an error if a symbol name appears more than once. All following
+messages are placed in a set with this number.
+@end itemize
+
+@item
+If a line contains as the first non-whitespace characters the sequence
+@code{$delset} followed by a whitespace character an additional argument
+is required to follow. This argument can either be:
+
+@itemize @minus
+@item
+a number. In this case the value of this number determines the set
+which will be deleted.
+
+@item
+an identifier consisting of alphanumeric characters plus the underscore
+character. This symbolic identifier must match a name for a set which
+previously was defined. It is an error if the name is unknown.
+@end itemize
+
+In both cases all messages in the specified set will be removed. They
+will not appear in the output. But if this set is later again selected
+with a @code{$set} command again messages could be added and these
+messages will appear in the output.
+
+@item
+If a line contains after leading whitespaces the sequence
+@code{$quote}, the quoting character used for this input file is
+changed to the first non-whitespace character following the
+@code{$quote}. If no non-whitespace character is present before the
+line ends quoting is disable.
+
+By default no quoting character is used. In this mode strings are
+terminated with the first unescaped line break. If there is a
+@code{$quote} sequence present newline need not be escaped. Instead a
+string is terminated with the first unescaped appearence of the quote
+character.
+
+A common usage of this feature would be to set the quote character to
+@code{"}. Then any appearence of the @code{"} in the strings must
+be escaped using the backslash (i.e., @code{\"} must be written).
+
+@item
+Any other line must start with a number or an alphanumeric identifier
+(with the underscore character included). The following characters
+(starting at the first non-whitespace character) will form the string
+which gets associated with the currently selected set and the message
+number represented by the number and identifier respectively.
+
+If the start of the line is a number the message number is obvious. It
+is an error if the same message number already appeared for this set.
+
+If the leading token was an identifier the message number gets
+automatically assigned. The value is the current maximum messages
+number for this set plus one. It is an error if the identifier was
+already used for a message in this set. It is ok to reuse the
+identifier for a message in another thread. How to use the symbolic
+identifiers will be explained below (@pxref{Common Usage}). There is
+one limitation with the identifier: it must not be @code{Set}. The
+reason will be explained below.
+
+Please note that you must use a quoting character if a message contains
+leading whitespace. Since one cannot guarantee this never happens it is
+probably a good idea to always use quoting.
+
+The text of the messages can contain escape characters. The usual bunch
+of characters known from the @w{ISO C} language are recognized
+(@code{\n}, @code{\t}, @code{\v}, @code{\b}, @code{\r}, @code{\f},
+@code{\\}, and @code{\@var{nnn}}, where @var{nnn} is the octal coding of
+a character code).
+@end itemize
+
+@strong{Important:} The handling of identifiers instead of numbers for
+the set and messages is a GNU extension. Systems strictly following the
+X/Open specification do not have this feature. An example for a message
+catalog file is this:
+
+@smallexample
+$ This is a leading comment.
+$quote "
+
+$set SetOne
+1 Message with ID 1.
+two " Message with ID \"two\", which gets the value 2 assigned"
+
+$set SetTwo
+$ Since the last set got the nubmer 1 assigned this set has number 2.
+4000 "The numbers can be arbitrary, they need not start at one."
+@end smallexample
+
+This small example shows various aspects:
+@itemize @bullet
+@item
+Lines 1 and 9 are comments since they start with @code{$} followed by
+a whitespace.
+@item
+The quoting character is set to @code{"}. Otherwise the quotes in the
+message definition would have to be left away and in this case the
+message with the identifier @code{two} would loose its leading whitespace.
+@item
+Mixing numbered messages with message having symbolic names is no
+problem and the numering happens automatically.
+@end itemize
+
+
+While this file format is pretty easy it is not the best possible for
+use in a running program. The @code{catopen} function would have to
+parser the file and handle syntactic errors gracefully. This is not so
+easy and the whole process is pretty slow. Therefore the @code{catgets}
+functions expect the data in another more compact and ready-to-use file
+format. There is a special programm @code{gencat} which is explained in
+detail in the next section.
+
+Files in this other format are not human readable. To be easy to use by
+programs it is a binary file. But the format is byte order independent
+so translation files can be shared by systems of arbitrary architecture
+(as long as they use the GNU C Library).
+
+Details about the binary file format are not important to know since
+these files are always created by the @code{gencat} program. The
+sources of the GNU C Library also provide the sources for the
+@code{gencat} program and so the interested reader can look throught
+these source files to learn about the file format.
+
+
+@node The gencat program
+@subsection Generate Message Catalogs files
+
+@cindex gencat
+The @code{gencat} program is specified in the X/Open standard and the
+GNU implementation follows this specification and so allows to process
+all correctly formed input files. Additionally some extension are
+implemented which help to work in a more reasonable way with the the
+@code{catgets} functions.
+
+The @code{gencat} program can be invoked in two ways:
+
+@example
+`gencat [@var{Option}]@dots{} [@var{Output-File} [@var{Input-File}]@dots{}]`
+@end example
+
+This is the interface defined in the X/Open standard. If no
+@var{Input-File} parameter is given input will be read from standard
+input. Multiple input files will be read as if they are concatenated.
+If @var{Output-File} is also missing, the output will be written to
+standard output. To provide the interface one is used from other
+programs a second interface is provided.
+
+@smallexample
+`gencat [@var{Option}]@dots{} -o @var{Output-File} [@var{Input-File}]@dots{}`
+@end smallexample
+
+The option @samp{-o} is used to specify the output file and all file
+arguments are used as input files.
+
+Beside this one can use @file{-} or @file{/dev/stdin} for
+@var{Input-File} to denote the standard input. Corresponding one can
+use @file{-} and @file{/dev/stdout} for @var{Output-File} to denote
+standard output. Using @file{-} as a file name is allowed in X/Open
+while using the device names is a GNU extension.
+
+The @code{gencat} program works by concatenating all input files and
+then @strong{merge} the resulting collection of message sets with a
+possiblity existing output file. This is done by removing all messages
+with set/message number touples matching any of the generated messages
+from the output file and then adding all the new messages. To
+regenerate a catalog file while ignoring the old contents therefore
+requires to remove the output file if it exists. If the output is
+written to standard output no merging takes place.
+
+@noindent
+The following table shows the options understood by the @code{gencat}
+program. The X/Open standard does not specify any option for the
+program so all of these are GNU extensions.
+
+@table @samp
+@item -V
+@itemx --version
+Print the version information and exit.
+@item -h
+@itemx --help
+Print a usage message listing all available options, then exit successfully.
+@item --new
+Do never merge the new messages from the input files with the old content
+of the output files. The old content of the output file is discarded.
+@item -H
+@itemx --header=name
+This option is used to emit the symbolic names given to sets and
+messages in the input files for use in the program. Details about how
+to use this are given in the next section. The @var{name} parameter to
+this option specifies the name of the output file. It will contain a
+number of C preprocessor @code{#define}s to associate a name with a
+number.
+
+Please note that the generated file only contains the symbols from the
+input files. If the output is merged with the previous content of the
+output file the possibly existing symbols from the file(s) which
+generated the old output files are not in the generated header file.
+@end table
+
+
+@node Common Usage
+@subsection How to use the @code{catgets} interface
+
+The @code{catgets} functions can be used in two different ways. By
+following slavishly the X/Open specs and not relying on the extension
+and by using the GNU extensions. We will take a look at the former
+method first to understand the benefits of extensions.
+
+@subsubsection Not using symbolic symbolic names
+
+Since the X/Open format of the message catalog files does not allow
+symbol names we have to work with numbers all the time. When we start
+writing a program we have to replace all appearences of translatable
+strings with someting like
+
+@smallexample
+catgets (catdesc, set, msg, "string")
+@end smallexample
+
+@noindent
+@var{catgets} is retrieved from a call to @code{catopen} which is
+normally done once at the program start. The @code{"string"} is the
+string we want to translate. The problems start with the set and
+message numbers.
+
+In a bigger program several programmers usually work at the same time on
+the program and so coordinating the number allocation is crucial.
+Though no two different strings must be indexed by the same touple of
+numbers it is highly desireable to reuse the numbers for equal strings
+with equal translations (please note that there might be strings which
+are equal in one language but have different translations due to
+difference contexts).
+
+The allocation process can be relaxed a bit by different set numbers for
+different parts of the program. So the number of developers who have to
+coordinate the allocation can be reduced. But still lists must be keep
+track of the allocation and errors can easily happen. These errors
+cannot be discovered by the compiler or the @code{catgets} functions.
+Only the user of the program might see wrong messages printed. In the
+worst cases the messages are so irritating that they cannot be
+recognized as wrong. Think about the translations for @code{"true"} and
+@code{"false"} being exchanged. This could result in a desaster.
+
+
+@subsubsection Using symbolic names
+
+The problems mentioned in the last section derive from the fact that:
+
+@enumerate
+@item
+the numbers are allocated once and due to the possibly frequent use of
+them it is difficult to change a number later.
+@item
+the numbers do not allow to guess anything about the string and
+therefore collisions can easily happen.
+@end enumerate
+
+By constantly using symbolic names and by providing a method which maps
+the string content to a symbolic name (however this will happen) one can
+prevent both problems above. The cost of this is that the programmer
+has to write a complete message catalog file while s/he is writing the
+program itself.
+
+This is necessary since the symbolic names must be mapped to numbers
+before the program sources can be compiled. In the last section it was
+described how to generate a header containing the mapping of the names.
+E.g., for the example message file given in the last section we could
+call the @code{gencat} program as follow (assume @file{ex.msg} contains
+the sources).
+
+@smallexample
+gencat -H ex.h -o ex.cat ex.msg
+@end smallexample
+
+@noindent
+This generates a header file with the following content:
+
+@smallexample
+#define SetTwoSet 0x2 /* u.msg:8 */
+
+#define SetOneSet 0x1 /* u.msg:4 */
+#define SetOnetwo 0x2 /* u.msg:6 */
+@end smallexample
+
+As can be seen the various symbols given in the source file are mangled
+to generate unique identifiers and these identifiers get numbers
+assigned. Reading the source file and knowing about the rules will
+allow to predict the content of the header file (it is deterministic)
+but this is not necessary. The @code{gencat} program can take care for
+everything. All the programmer has to do is to put the generated header
+file in the dependency list of the source files of her/his project and
+to add a rules to regenerate the header of any of the input files
+change.
+
+One word about the symbol mangling. Every symbol consists of two parts:
+the name of the message set plus the name of the message or the special
+string @code{Set}. So @code{SetOnetwo} means this macro can be used to
+access the translation with identifier @code{two} in the message set
+@code{SetOne}.
+
+The other names denote the names of the message sets. The special
+string @code{Set} is used in the place of the message identifier.
+
+If in the code the second string of the set @code{SetOne} is used the C
+code should look like this:
+
+@smallexample
+catgets (catdesc, SetOneSet, SetOnetwo,
+ " Message with ID \"two\", which gets the value 2 assigned")
+@end smallexample
+
+Writing the function this way will allow to change the message number
+and even the set number without requiring any change in the C source
+code. (The text of the string is normally not the same; this is only
+for this example.)
+
+
+@subsubsection How does to this allow to develop
+
+To illustrate the usual way to work with the symbolic version numbers
+here is a little example. Assume we want to write the very complex and
+famous greeting program. We start by writing the code as usual:
+
+@smallexample
+#include <stdio.h>
+int
+main (void)
+@{
+ printf ("Hello, world!\n");
+ return 0;
+@}
+@end smallexample
+
+Now we want to internationalize the message and therefore replace the
+message with whatever the user wants.
+
+@smallexample
+#include <nl_types.h>
+#include <stdio.h>
+#include "msgnrs.h"
+int
+main (void)
+@{
+ nl_catd catdesc = catopen ("hello.cat", NL_CAT_LOCALE);
+ printf (catgets (catdesc, SetMainSet, SetMainHello, "Hello, world!\n"));
+ catclose (catdesc);
+ return 0;
+@}
+@end smallexample
+
+We see how the catalog object is opened and the returned descriptor used
+in the other function calls. It is not really necessary to check for
+failure of any of the functions since even in these situations the
+functions will behave reasonable. They simply will be return a
+translation.
+
+What remains unspecified here are the constants @code{SetMainSet} and
+@code{SetMainHello}. These are the symbolic names describing the
+message. To get the actual definitions which match the information in
+the catalog file we have to create the message catalog source file and
+process it using the @code{gencat} program.
+
+@smallexample
+$ Messages for the famous greeting program.
+$quote "
+
+$set Main
+Hello "Hallo, Welt!\n"
+@end smallexample
+
+Now we can start building the program (assume the message catalog source
+file is named @file{hello.msg} and the program source file @file{hello.c}):
+
+@smallexample
+@cartouche
+% gencat -H msgnrs.h -o hello.cat hello.msg
+% cat msgnrs.h
+#define MainSet 0x1 /* hello.msg:4 */
+#define MainHello 0x1 /* hello.msg:5 */
+% gcc -o hello hello.c -I.
+% cp hello.cat /usr/share/locale/de/LC_MESSAGES
+% echo $LC_ALL
+de
+% ./hello
+Hallo, Welt!
+%
+@end cartouche
+@end smallexample
+
+The call of the @code{gencat} program creates the missing header file
+@file{msgnrs.h} as well as the message catalog binary. The former is
+used in the compilation of @file{hello.c} while the later is placed in a
+directory in which the @code{catopen} function will try to locate it.
+Please check the @code{LC_ALL} environment variable and the default path
+for @code{catopen} presented in the description above.
+
+
+@node The Uniforum approach
+@section The Uniforum approach to Message Translation
+
+Sun Microsystems tried to standardize a different approach to message
+translation in the Uniforum group. There never was a real standard
+defined but still the interface was used in Sun's operation systems.
+Since this approach fits better in the development process of free
+software it is also used throughout the GNU package and the GNU
+@file{gettext} package provides support for this outside the GNU C
+Library.
+
+The code of the @file{libintl} from GNU @file{gettext} is the same as
+the code in the GNU C Library. So the documentation in the GNU
+@file{gettext} manual is also valid for the functionality here. The
+following text will describe the library functions in detail. But the
+numerous helper programs are not described in this manual. Instead
+people should read the GNU @file{gettext} manual
+(@pxref{Top,,GNU gettext utilities,gettext,Native Language Support Library and Tools}).
+We will only give a short overview.
+
+Though the @code{catgets} functions are available by default on more
+systems the @code{gettext} interface is at least as portable as the
+former. The GNU @file{gettext} package can be used wherever the
+functions are not available.
+
+
+@menu
+* Message catalogs with gettext:: The @code{gettext} family of functions.
+* Helper programs for gettext:: Programs to handle message catalogs
+ for @code{gettext}.
+@end menu
+
+
+@node Message catalogs with gettext
+@subsection The @code{gettext} family of functions
+
+The paradigms underlying the @code{gettext} approach to message
+translations is different from that of the @code{catgets} functions the
+basic functionally is equivalent. There are functions of the following
+categories:
+
+@menu
+* Translation with gettext:: What has to be done to translate a message.
+* Locating gettext catalog:: How to determine which catalog to be used.
+* Using gettextized software:: The possibilities of the user to influence
+ the way @code{gettext} works.
+@end menu
+
+@node Translation with gettext
+@subsubsection What has to be done to translate a message?
+
+The @code{gettext} functions have a very simple interface. The most
+basic function just takes the string which shall be translated as the
+argument and it returns the translation. This is fundamentally
+different from the @code{catgets} approach where an extra key is
+necessary and the original string is only used for the error case.
+
+If the string which has to be translated is the only argument this of
+course means the string itself is the key. I.e., the translation will
+be selected based on the original string. The message catalogs must
+therefore contain the original strings plus one translation for any such
+string. The task of the @code{gettext} function is it to compare the
+argument string with the available strings in the catalog and return the
+appropriate translation. Of course this process is optimized so that
+this process is not more expensive than an access using an atomic key
+like in @code{catgets}.
+
+The @code{gettext} approach has some advantages but also some
+disadvantages. Please see the GNU @file{gettext} manual for a detailed
+discussion of the pros and cons.
+
+All the definitions and declarations for @code{gettext} can be found in
+the @file{libintl.h} header file. On systems where these functions are
+not part of the C library they can be found in a separate library named
+@file{libintl.a} (or accordingly different for shared libraries).
+
+@deftypefun {char *} gettext (const char *@var{msgid})
+The @code{gettext} function searches the currently selected message
+catalogs for a string which is equal to @var{msgid}. If there is such a
+string available it is returned. Otherwise the argument string
+@var{msgid} is returned.
+
+Please note that all though the return value is @code{char *} the
+returned string must not be changed. This broken type results from the
+history of the function and does not reflect the way the function should
+be used.
+
+Please note that above we wrote ``message catalogs'' (plural). This is
+a speciality of the GNU implementation of these functions and we will
+say more about this in section @xref{Locating gettext catalog} when we
+talk about the ways message catalogs are selected.
+
+The @code{gettext} function does not modify the value of the global
+@var{errno} variable. This is necessary to make it possible to write
+something like
+
+@smallexample
+ printf (gettext ("Operation failed: %m\n"));
+@end smallexample
+
+Here the @var{errno} value is used in the @code{printf} function while
+processing the @code{%m} format element and if the @code{gettext}
+function would change this value (it is called before @code{printf} is
+called) we wouls get a wrong message.
+
+So there is no easy way to detect a missing message catalog beside
+comparing the argument string with the result. But it is normally the
+task of the user to react on missing catalogs. The program cannot guess
+when a message catalog is really necessary since for a user who s peaks
+the language the program was developed in does not need any translation.
+@end deftypefun
+
+The remaining two functions to access the message catalog add some
+functionality to select a message catalog which is not the default one.
+This is important if parts of the program are developed independently.
+Every part can have its own message catalog and all of them can be used
+at the same time. The C library itself is an example: internally it
+uses the @code{gettext} functions but since it must not depend on a
+currently selected default message catalog it must specify all ambiguous
+information.
+
+@deftypefun {char *} dgettext (const char *@var{domainname}, const char *@var{msgid})
+The @code{dgettext} functions acts just like the @code{gettext}
+function. It only takes an additional first argument @var{domainname}
+which guides the selection of the message catalogs which are searched
+for the translation. If the @var{domainname} parameter is the null
+pointer the @code{dgettext} function is exactly equivalent to
+@code{gettext} since the default value for the domain name is used.
+
+As for @code{gettext} the return value type is @code{char *} which is an
+anachronism. The returned string must never be modfied.
+@end deftypefun
+
+@deftypefun {char *} dcgettext (const char *@var{domainname}, const char *@var{msgid}, int @var{category})
+The @code{dcgettext} adds another argument to those which
+@code{dgettext} takes. This argument @var{category} specifies the last
+piece of information needed to localize the message catalog. I.e., the
+domain name and the locale category exactly specify which message
+catalog has to be used (relative to a given directory, see below).
+
+The @code{dgettext} function can be expressed in terms of
+@code{dcgettext} by using
+
+@smallexample
+dcgettext (domain, string, LC_MESSAGES)
+@end smallexample
+
+@noindent
+instead of
+
+@smallexample
+dgettext (domain, string)
+@end smallexample
+
+This also shows which values are expected for the third parameter. One
+has to use the available selectors for the categories available in
+@file{locale.h}. Normally the available values are @code{LC_CTYPE},
+@code{LC_COLLATE}, @code{LC_MESSAGES}, @code{LC_MONETARY},
+@code{LC_NUMERIC}, and @code{LC_TIME}. Please note that @code{LC_ALL}
+must not be used and even though the names might suggest this, there is
+no relation to the environments variables of this name.
+
+The @code{dcgettext} function is only implemented for compatibility with
+other systems which have @code{gettext} functions. There is not really
+any situation where it is necessary (or useful) to use a different value
+but @code{LC_MESSAGES} in for the @var{category} parameter. We are
+dealing with messages here and any other choice can only be irritating.
+
+As for @code{gettext} the return value type is @code{char *} which is an
+anachronism. The returned string must never be modfied.
+@end deftypefun
+
+When using the three functions above in a program it is a frequent case
+that the @var{msgid} argument is a constant string. So it is worth to
+optimize this case. Thinking shortly about this one will realize that
+as long as no new message catalog is loaded the translation of a message
+will not change. I.e., the algorithm to determine the translation is
+deterministic.
+
+Exactly this is what the optimizations implemented in the
+@file{libintl.h} header will use. Whenver a program is compiler with
+the GNU C compiler, optimization is selected and the @var{msgid}
+argument to @code{gettext}, @code{dgettext} or @code{dcgettext} is a
+constant string the actual function call will only be done the first
+time the message is used and then always only if any new message catalog
+was loaded and so the result of the translation lookup might be
+different. See the @file{libintl.h} header file for details. For the
+user it is only important to know that the result is always the same,
+independent of the compiler or compiler options in use.
+
+
+@node Locating gettext catalog
+@subsubsection How to determine which catalog to be used
+
+The functions to retrieve the translations for a given mesage have a
+remarkable simple interface. But to provide the user of the program
+still the opportunity to select exactly the translation s/he wants and
+also to provide the programmer the possibility to influence the way to
+locate the search for catalogs files there is a quite complicated
+underlying mechanism which controls all this. The code is complicated
+the use is easy.
+
+Basically we have two different tasks to perform which can also be
+performed by the @code{catgets} functions:
+
+@enumerate
+@item
+Locate the set of message catalogs. There are a number of files for
+different languages and which all belong to the package. Usually they
+are all stored in the filesystem below a certain directory.
+
+There can be arbitrary many packages installed and they can follow
+different guidelines for the placement of their files.
+
+@item
+Relative to the location specified by the package the actual translation
+files must be searched, based on the wishes of the user. I.e., for each
+language the user selects the program should be able to locate the
+appropriate file.
+@end enumerate
+
+This is the functionality required by the specifications for
+@code{gettext} and this is also what the @code{catgets} functions are
+able to do. But there are some problems unresolved:
+
+@itemize @bullet
+@item
+The language to be used can be specified in several different ways.
+There is no generally accepted standard for this and the user always
+expects the program understand what s/he means. E.g., to select the
+German translation one could write @code{de}, @code{german}, or
+@code{deutsch} and the program should always react the same.
+
+@item
+Sometimes the specification of the user is too detailed. If s/he, e.g.,
+specifies @code{de_DE.ISO-8859-1} which means German, spoken in Germany,
+coded using the @w{ISO 8859-1} character set there is the possibility
+that a message catalog matching this exactly is not available. But
+there could be a catalog matching @code{de} and if the character set
+used on the machine is always @w{ISO 8859-1} there is no reason why this
+later message catalog should not be used. (We call this @dfn{message
+inheritance}.)
+
+@item
+If a catalog for a wanted language is not available it is not always the
+second best choice to fall back on the language of the developer and
+simply not translate any message. Instead a user might be better able
+to read the messages in another language and so the user of the program
+should be able to define an precedence order of languages.
+@end itemize
+
+We can devide the configuration actions in two parts: the one is
+performed by the programmer, the other by the user. We will start with
+the functions the programmer can use since the user configuration will
+be based on this.
+
+As the functions described in the last sections already mention separate
+sets of messages can be selected by a @dfn{domain name}. This is a
+simple string which should be unique for each program part with uses a
+separate domain. It is possible to use in one program arbitrary many
+domains at the same time. E.g., the GNU C Library itself uses a domain
+named @code{libc} while the program using the C Library could use a
+domain named @code{foo}. The important point is that at any time
+exactly one domain is active. This is controlled with the following
+function.
+
+@deftypefun {char *} textdomain (const char *@var{domainname})
+The @code{textdomain} function sets the default domain, which is used in
+all future @code{gettext} calls, to @var{domainname}. Please note that
+@code{dgettext} and @code{dcgettext} calls are not influenced if the
+@var{domainname} parameter of these functions is not the null pointer.
+
+Before the first call to @code{textdomain} the default domain is
+@code{messages}. This is the name specified in the fpsecification of
+the @code{gettext} API. This name is as good as any other name. No
+program should ever really use a domain with this name since this can
+only lead to problems.
+
+The function returns the value which is from now on taken as the default
+domain. If the system went out of memory the returned value is
+@code{NULL} and the global variable @var{errno} is set to @code{ENOMEM}.
+Despite the return value type being @code{char *} the return string must
+not be changed. It is allocated internally by the @code{textdomain}
+function.
+
+If the @var{domainname} parameter is the null pointer no new default
+domain is set. Instead the currently selected default domain is
+returned.
+
+If the @var{domainname} parameter is the empty string the default domain
+is reset to its initial value, the domain with the name @code{messages}.
+This possibility is questionable to use since the domain @code{messages}
+really never should be used.
+@end deftypefun
+
+@deftypefun {char *} bindtextdomain (const char *@var{domainname}, const char *@var{dirname})
+The @code{bindtextdomain} function can be used to specify the directly
+which contains the message catalogs for domain @var{domainname} for the
+different languages. To be correct, this is the directory where the
+hierachy of directories is expected. Details are explained below.
+
+For the programmer it is important to note that the translations which
+come with the program have be placed in a directory hierachy starting
+at, say, @file{/foo/bar}. Then the program should make a
+@code{bindtextdomain} call to bind the domain for the current program to
+this directory. So it is made sure the catalogs are found. A correctly
+running program does not depend on the user setting an environment
+variable.
+
+The @code{bindtextdomain} function can be used several times and if the
+@var{domainname} argument is different the previously boundd domains
+will not be overwritten.
+
+If the @var{dirname} parameter is the null pointer @code{bindtextdomain}
+returns the currently selected directory for the domain with the name
+@var{domainname}.
+
+the @code{bindtextdomain} function returns a pointer to a string
+containing the name of the selected directory name. The string is
+allocated internally in the function and must not be changed by the
+user. If the system went out of core during the execution of
+@code{bindtextdomain} the return value is @code{NULL} and the global
+variable @var{errno} is set accordingly.
+@end deftypefun
+
+
+@node Using gettextized software
+@subsubsection User influence on @code{gettext}
+
+The last sections described what the programmer can do to
+internationalize the messages of the program. But it is finally up to
+the user to select the message s/he wants to see. S/He must understand
+them.
+
+The POSIX locale model uses the environment variables @code{LC_COLLATE},
+@code{LC_CTYPE}, @code{LC_MESSAGES}, @code{LC_MONETARY}, @code{NUMERIC},
+and @code{LC_TIME} to select the locale which is to be used. This way
+the user can influence lots of functions. As we mentioned above the
+@code{gettext} functions also take advantage of this.
+
+To understand how this happens it is necessary to take a look at the
+various components of the filename which gets computed to locate a
+message catalog. It is composed as follows:
+
+@smallexample
+@var{dir_name}/@var{locale}/LC_@var{category}/@var{domain_name}.mo
+@end smallexample
+
+The default value for @var{dir_name} is system specific. It is computed
+from the value given as the prefix while configuring the C library.
+This value normally is @file{/usr} or @file{/}. For the former the
+complete @var{dir_name} is:
+
+@smallexample
+/usr/share/locale
+@end smallexample
+
+We can use @file{/usr/share} since the @file{.mo} files containing the
+message catalogs are system independent, all systems can use the same
+files. If the program executed the @code{bindtextdomain} function for
+the message domain that is currently handled the @code{dir_name}
+component is the exactly the value which was given to the function as
+the second parameter. I.e., @code{bindtextdomain} allows to overwrite
+the only system depdendent and fixed value to make it possible to
+address file everywhere in the filesystem.
+
+The @var{category} is the name of the locale category which was selected
+in the program code. For @code{gettext} and @code{dgettext} this is
+always @code{LC_MESSAGES}, for @code{dcgettext} this is selected by the
+value of the third parameter. As said above it should be avoided to
+ever use a category other than @code{LC_MESSAGES}.
+
+The @var{locale} component is computed based on the category used. Just
+like for the @code{setlocale} function here comes the user selection
+into the play. Some environment variables are examined in a fixed order
+and the first environment variable set determines the return value of
+the lookup process. In detail, for the category @code{LC_xxx} the
+following variables in this order are examined:
+
+@table @code
+@item LANGUAGE
+@item LC_ALL
+@item LC_xxx
+@item LANG
+@end table
+
+This looks very familiar. With the exception of the @code{LANGUAGE}
+environment variable this is exactly the lookup order the
+@code{setlocale} function uses. But why introducing the @code{LANGUAGE}
+variable?
+
+The reason is that the syntax of the values these variables can have is
+different to what is expected by the @code{setlocale} function. If we
+would set @code{LC_ALL} to a value following the extended syntax that
+would mean the @code{setlocale} function will never be able to use the
+value of this variable as well. An additional variable removes this
+problem plus we can select the language independently of the locale
+setting which sometimes is useful.
+
+While for the @code{LC_xxx} variables the value should consist of
+exactly one specification of a locale the @code{LANGUAGE} variable's
+value can consist of a colon separated list of locale names. The
+attentive reader will realize that this is the way we manage to
+implement one of our additional demands above: we want to be able to
+specify an ordered list of language.
+
+Back to the constructed filename we have only one component missing.
+The @var{domain_name} part is the name which was either registered using
+the @code{textdomain} function or which was given to @code{dgettext} or
+@code{dcgettext} as the first parameter. Now it becomes obvious that a
+good choice for the domain name in the program code is a string which is
+closely related to the program/package name. E.g., for the GNU C
+Library the domain name is @code{libc}.
+
+@noindent
+A limit piece of example code should show how the programmer is supposed
+to work:
+
+@smallexample
+@{
+ textdomain ("test-package");
+ bindtextdomain ("test-package", "/usr/local/share/locale");
+ puts (gettext ("Hello, world!");
+@}
+@end smallexample
+
+At the program start the default domain is @code{messages}. The
+@code{textdomain} call changes this to @code{test-package}. The
+@code{bindtextdomain} call specifies that the message catalogs for the
+domain @code{test-package} can be found below the directory
+@file{/usr/local/share/locale}.
+
+If now the user set in her/his environment the variable @code{LANGUAGE}
+to @code{de} the @code{gettext} function will try to use the
+translations from the file
+
+@smallexample
+/usr/local/share/locale/de/LC_MESSAGES/test-package.mo
+@end smallexample
+
+From the above descriptions it should be clear which component of this
+filename is determined fromby which source.
+
+@c Describe:
+@c * message inheritence
+@c * locale aliasing
+@c * character set dependence
+
+
+@node Helper programs for gettext
+@subsection Programs to handle message catalogs for @code{gettext}
+
+@c Describe:
+@c * msgfmt
+@c * xgettext
+@c Mention:
+@c * other programs from GNU gettext
diff --git a/manual/search.texi b/manual/search.texi
index 26a8f82..abb93bb 100644
--- a/manual/search.texi
+++ b/manual/search.texi
@@ -1,4 +1,4 @@
-@node Searching and Sorting, Pattern Matching, Locales, Top
+@node Searching and Sorting, Pattern Matching, Message Translation, Top
@chapter Searching and Sorting
This chapter describes functions for searching and sorting arrays of
diff --git a/manual/startup.texi b/manual/startup.texi
index e61a755..fab74ed 100644
--- a/manual/startup.texi
+++ b/manual/startup.texi
@@ -278,9 +278,9 @@ character, since this is assumed to terminate the string.
@menu
* Environment Access:: How to get and set the values of
- environment variables.
+ environment variables.
* Standard Environment:: These environment variables have
- standard interpretations.
+ standard interpretations.
@end menu
@node Environment Access
@@ -290,7 +290,9 @@ character, since this is assumed to terminate the string.
The value of an environment variable can be accessed with the
@code{getenv} function. This is declared in the header file
-@file{stdlib.h}.
+@file{stdlib.h}. All of the following functions can be safely used in
+multi-threaded programs. It is made sure that concurrent modifications
+to the environment do not lead to errors.
@pindex stdlib.h
@comment stdlib.h
@@ -314,11 +316,62 @@ definition is added to the environment. Otherwise, the @var{string} is
interpreted as the name of an environment variable, and any definition
for this variable in the environment is removed.
-The GNU library provides this function for compatibility with SVID; it
-may not be available in other systems.
+This function is part of the extended Unix interface. Since it was also
+available in old SVID libraries you should define either
+@var{_XOPEN_SOURCE} or @var{_SVID_SOURCE} before including any header.
+@end deftypefun
+
+
+@comment stdlib.h
+@comment BSD
+@deftypefun int setenv (const char *@var{name}, const char *@var{value}, int @var{replace})
+The @code{setenv} function can be used to add a new definition to the
+environment. The entry with the name @var{name} is replaced by the
+value @samp{@var{name}=@var{value}}. Please note that this is also true
+if @var{value} is the empty string. A null pointer for the @var{value}
+parameter is illegal. If the environment already contains an entry with
+key @var{name} the @var{replace} parameter controls the action. If
+replace is zero, nothing happens. otherwise the old entry is replaced
+by the new one.
+
+Please note that you cannot remove an entry completely using this function.
+
+This function is part of the BSD library. The GNU C Library provides
+this function for compatibility but it may not be available on other
+systems.
+@end deftypefun
+
+@comment stdlib.h
+@comment BSD
+@deftypefun void unsetenv (const char *@var{name})
+Using this function one can remove an entry completely from the
+environment. If the environment contains an entry with the key
+@var{name} this whole entry is removed. A call to this function is
+equivalent to a call to @code{putenv} when the @var{value} part of the
+string is empty.
+
+This function is part of the BSD library. The GNU C Library provides
+this function for compatibility but it may not be available on other
+systems.
+@end deftypefun
+
+There is one more function to modify the whole environment. This
+function is said to be used in the POSIX.9 (POSIX bindings for Fortran
+77) and so one should expect it did made it into POSIX.1. But this
+never happened. But we still provide this function as a GNU extension
+to enable writing standard compliant Fortran environments.
+
+@comment stdlib.h
+@comment GNU
+@deftypefun int clearenv (void)
+The @code{clearenv} function removes all entries from the environment.
+Using @code{putenv} and @code{setenv} new entries can be added again
+later.
+
+If the function is successful it returns @code{0}. Otherwise the return
+value is nonzero.
@end deftypefun
-@c !!! BSD function setenv
You can deal directly with the underlying representation of environment
objects to add more variables to the environment (for example, to
@@ -444,6 +497,14 @@ attribute category environment variables, or for the @code{LANG}
environment variable.
@end ignore
+@item LC_ALL
+@cindex LC_ALL environment variable
+
+If this environment variable is set it overrides the selection for all
+the locales done using the other @code{LC_*} environment variables. The
+value of the other @code{LC_*} environment variables is simply ignored
+in this case.
+
@item LC_COLLATE
@cindex LC_COLLATE environment variable
@@ -455,6 +516,12 @@ This specifies what locale to use for string sorting.
This specifies what locale to use for character sets and character
classification.
+@item LC_MESSAGES
+@cindex LC_MESSAGES environment variable
+
+This specifies what locale to use for printing messages and to parse
+reponses.
+
@item LC_MONETARY
@cindex LC_MONETARY environment variable
@@ -470,6 +537,12 @@ This specifies what locale to use for formatting numbers.
This specifies what locale to use for formatting date/time values.
+@item NLSPATH
+@cindex NLSPATH environment variable
+
+This specifies the directories in which the @code{catopen} function
+looks for message translation catalogs.
+
@item _POSIX_OPTION_ORDER
@cindex _POSIX_OPTION_ORDER environment variable.
diff --git a/manual/string.texi b/manual/string.texi
index 46101de..07ed35b 100644
--- a/manual/string.texi
+++ b/manual/string.texi
@@ -365,7 +365,7 @@ string has the same limitations as any block of memory allocated using
@code{alloca}.
For obvious reasons @code{strdupa} is implemented only as a macro. I.e.,
-you cannot get the address of this function. Despite this limitations
+you cannot get the address of this function. Despite this limitation
it is a useful function. The following code shows a situation where
using @code{malloc} would be a lot more expensive.
@@ -374,7 +374,7 @@ using @code{malloc} would be a lot more expensive.
@end smallexample
Please note that calling @code{strtok} using @var{path} directly is
-illegal.
+invalid.
This function is only available if GNU CC is used.
@end deftypefun