aboutsummaryrefslogtreecommitdiff
path: root/manual
diff options
context:
space:
mode:
Diffstat (limited to 'manual')
-rw-r--r--manual/message.texi132
1 files changed, 122 insertions, 10 deletions
diff --git a/manual/message.texi b/manual/message.texi
index 7640e21..76587d1 100644
--- a/manual/message.texi
+++ b/manual/message.texi
@@ -1167,19 +1167,131 @@ translations from the file
@end smallexample
From the above descriptions it should be clear which component of this
-filename is determined fromby which source.
+filename is determined by which source.
+
+In the above example we assumed that the @code{LANGUAGE} environment
+variable to @code{de}. This might be an appropriate selection but what
+happens if the user wants to use @code{LC_ALL} because of the wider
+usability and here the required value is @code{de_DE.ISO-8859-1}? We
+already mentioned above that a situation like this is not infrequent.
+E.g., a person might prefer reading a dialect and if this is not
+available fall back on the standard language.
+
+The @code{gettext} functions know about situations like this and can
+handle them gracefully. The functions recognize the format of the value
+of the environment variable. It can split the value is different pieces
+and by leaving out the only or the other part it can construct new
+values. This happens of course in a predictable way. To understand
+this one must know the format of the environment variable value. There
+are to more or less standardized forms:
+
+@table @emph
+@item X/Open Format
+@code{language[_territory[.codeset]][@@modifier]}
+
+@item CEN Format (European Community Standard)
+@code{language[_territory][+audience][+special][,[sponsor][_revision]]}
+@end table
+
+The functions will automatically recognize which format is used. Less
+specific locale names will be stripped of in the order of the following
+list:
-@c Describe:
-@c * message inheritence
-@c * locale aliasing
-@c * character set dependence
+@enumerate
+@item
+@code{revision}
+@item
+@code{sponsor}
+@item
+@code{special}
+@item
+@code{codeset}
+@item
+@code{normalized codeset}
+@item
+@code{territory}
+@item
+@code{audience}/@code{modifier}
+@end enumerate
+
+From the last entry one can see that the meaning of the @code{modifer}
+field in the X/Open format and the @code{audience} format have the same
+meaning. Beside one can see that the @code{language} field for obvious
+reasons never will be dropped.
+
+The only new thing is the @code{normalized codeset} entry. This is
+another goodie which is introduced to help reducing the chaos which
+derives from the inability of the people to standardize the names of
+character sets. Instead of @w{ISO-8859-1} one can often see @w{8859-1},
+@w{88591}, @w{iso8859-1}, or @w{iso_8859-1}. The @code{normalized
+codeset} value is generated from the user-provided character set name by
+applying the following rules:
+
+@enumerate
+@item
+Remove all characters beside numbers and letters.
+@item
+Fold letters to lowercase.
+@item
+If the same only contains digits prepend the string @code{"iso"}.
+@end enumerate
+
+@noindent
+So all of the above name will be normalized to @code{iso88591}. This
+allows the program user much more freely choosing the locale name.
+
+Even this extended functionality still does not help to solve the
+problem that completely different names can be used to denote the same
+locale (e.g., @code{de} and @code{german}). To be of help in this
+situation the locale implementation and also the @code{gettext}
+functions know about aliases.
+
+The file @file{/usr/share/locale/locale.alias} (replace @file{/usr} with
+whatever prefix you used for configuring the C library) contains a
+mapping of alternative names to more regular names. The system manager
+is free to add new entries to fill her/his own needs. The selected
+locale from the environment is compared with the entries in the first
+column of this file ignoring the case. If they match the value of the
+second column is used instead for the further handling.
+
+In the description of the format of the environment variables we already
+mentioned the character set as a factor in the selection of the message
+catalog. In fact, only catalogs which contain text written using the
+character set of the system/program can be used (directly; there will
+come a solution for this some day). This means for the user that s/he
+will always have to take care for this. If in the collection of the
+message catalogs there are files for the same language but coded using
+different character sets the user has to be careful.
@node Helper programs for gettext
@subsection Programs to handle message catalogs for @code{gettext}
-@c Describe:
-@c * msgfmt
-@c * xgettext
-@c Mention:
-@c * other programs from GNU gettext
+The GNU C Library does not contain the source code for the programs to
+handle message catalogs for the @code{gettext} functions. As part of
+the GNU project the GNU gettext package contains everything the
+developer needs. The functionality provided by the tools in this
+package by far exceeds the abilities of the @code{gencat} program
+described above for the @code{catgets} functions.
+
+There is a program @code{msgfmt} which is the equivalent program to the
+@code{gencat} program. It generates from the human-readable and
+-editable form of the message catalog a binary file which can be used by
+the @code{gettext} functions. But there are several more programs
+available.
+
+The @code{xgettext} program can be used to automatically extract the
+translatable messages from a source file. I.e., the programmer need not
+take care for the translations and the list of messages which have to be
+translated. S/He will simply wrap the translatable string in calls to
+@code{gettext} et.al and the rest will be done by @code{xgettext}. This
+program has a lot of option which help to customize the output or do
+help to understand the input better.
+
+Other programs help to manage development cycle when new messages appear
+in the source files or when a new translation of the messages appear.
+here it should only be noted that using all the tools in GNu gettext it
+is possible to @emph{completely} automize the handling of message
+catalog. Beside marking the translatable string in the source code and
+generating the translations the developers do not have anything to do
+themself.