aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--ChangeLog1
-rw-r--r--manual/message.texi314
2 files changed, 310 insertions, 5 deletions
diff --git a/ChangeLog b/ChangeLog
index 060633e..667072d 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -28,6 +28,7 @@
* intl/po2test.sed: New file.
* intl/tst-gettext.c: New file.
* intl/tst-gettext.sh: New file.
+ * manual/message.texi: Document new interfaces.
* intl/gettext.c: Call __dcgettext directly.
diff --git a/manual/message.texi b/manual/message.texi
index 294b9f0..232f087 100644
--- a/manual/message.texi
+++ b/manual/message.texi
@@ -226,7 +226,7 @@ When an error occured the global variable @var{errno} is set to
@item EBADF
The catalog does not exist.
@item ENOMSG
-The set/message ttuple does not name an existing element in the
+The set/message tuple does not name an existing element in the
message catalog.
@end table
@@ -470,7 +470,7 @@ This is the interface defined in the X/Open standard. If no
@var{Input-File} parameter is given input will be read from standard
input. Multiple input files will be read as if they are concatenated.
If @var{Output-File} is also missing, the output will be written to
-standard output. To provide the interface one is used from other
+standard output. To provide the interface one is used to from other
programs a second interface is provided.
@smallexample
@@ -604,10 +604,10 @@ gencat -H ex.h -o ex.cat ex.msg
This generates a header file with the following content:
@smallexample
-#define SetTwoSet 0x2 /* u.msg:8 */
+#define SetTwoSet 0x2 /* ex.msg:8 */
-#define SetOneSet 0x1 /* u.msg:4 */
-#define SetOnetwo 0x2 /* u.msg:6 */
+#define SetOneSet 0x1 /* ex.msg:4 */
+#define SetOnetwo 0x2 /* ex.msg:6 */
@end smallexample
As can be seen the various symbols given in the source file are mangled
@@ -768,6 +768,8 @@ categories:
@menu
* Translation with gettext:: What has to be done to translate a message.
* Locating gettext catalog:: How to determine which catalog to be used.
+* Advanced gettext functions:: Additional functions for more complicated
+ situations.
* Using gettextized software:: The possibilities of the user to influence
the way @code{gettext} works.
@end menu
@@ -800,6 +802,8 @@ the @file{libintl.h} header file. On systems where these functions are
not part of the C library they can be found in a separate library named
@file{libintl.a} (or accordingly different for shared libraries).
+@comment libintl.h
+@comment GNU
@deftypefun {char *} gettext (const char *@var{msgid})
The @code{gettext} function searches the currently selected message
catalogs for a string which is equal to @var{msgid}. If there is such a
@@ -845,6 +849,8 @@ uses the @code{gettext} functions but since it must not depend on a
currently selected default message catalog it must specify all ambiguous
information.
+@comment libintl.h
+@comment GNU
@deftypefun {char *} dgettext (const char *@var{domainname}, const char *@var{msgid})
The @code{dgettext} functions acts just like the @code{gettext}
function. It only takes an additional first argument @var{domainname}
@@ -857,6 +863,8 @@ As for @code{gettext} the return value type is @code{char *} which is an
anachronism. The returned string must never be modified.
@end deftypefun
+@comment libintl.h
+@comment GNU
@deftypefun {char *} dcgettext (const char *@var{domainname}, const char *@var{msgid}, int @var{category})
The @code{dcgettext} adds another argument to those which
@code{dgettext} takes. This argument @var{category} specifies the last
@@ -990,6 +998,8 @@ domain named @code{foo}. The important point is that at any time
exactly one domain is active. This is controlled with the following
function.
+@comment libintl.h
+@comment GNU
@deftypefun {char *} textdomain (const char *@var{domainname})
The @code{textdomain} function sets the default domain, which is used in
all future @code{gettext} calls, to @var{domainname}. Please note that
@@ -1019,6 +1029,8 @@ This possibility is questionable to use since the domain @code{messages}
really never should be used.
@end deftypefun
+@comment libintl.h
+@comment GNU
@deftypefun {char *} bindtextdomain (const char *@var{domainname}, const char *@var{dirname})
The @code{bindtextdomain} function can be used to specify the directory
which contains the message catalogs for domain @var{domainname} for the
@@ -1056,6 +1068,298 @@ variable @var{errno} is set accordingly.
@end deftypefun
+@node Advanced gettext functions
+@subsubsection Additional functions for more complicated situations
+
+The functions of the @code{gettext} family described so far (and all the
+@code{catgets} functions as well) have one problem in the real world
+which have been neglected completely in all existing approaches. What
+is meant here is the handling of plural forms.
+
+Looking through Unix source code before the time anybody thought about
+internationalization (and, sadly, even afterwards) one can often find
+code similar to the following:
+
+@smallexample
+ printf ("%d file%s deleted", n, n == 1 ? "" : "s");
+@end smallexample
+
+@noindent
+After the first complains from people internationalizing the code people
+either completely avoided formulations like this or used strings like
+@code{"file(s)"}. Both look unnatural and should be avoided. First
+tries to solve the problem correctly looked like this:
+
+@smallexample
+ if (n == 1)
+ printf ("%d file deleted", n);
+ else
+ printf ("%d files deleted", n);
+@end smallexample
+
+But this does not solve the problem. It helps languages where the
+plural form of a noun is not simply constructed by adding an `s' but
+that is all. Once again people fell into the trap of believing the
+rules their language is using are universal. But the handling of plural
+forms differs widely between the language families. There are two
+things we can differ between (and even inside language families);
+
+@itemize @bullet
+@item
+The form how plural forms are build differs. This is a problem with
+language which have many irregularities. German, for instance, is a
+drastic case. Though English and German are part of the same language
+family (Germanic), the almost regular forming of plural noun forms
+(appending an `s') is ardly found in German.
+
+@item
+The number of plural forms differ. This is somewhat surprising for
+those who only have experiences with Romanic and Germanic languages
+since here the number is the same (there are two).
+
+But other language families have only one form or many forms. More
+information on this in an extra section.
+@end itemize
+
+The consequence of this is that application writers should not try to
+solve the problem in their code. This would be localization since it is
+only usable for certain, hardcoded language environments. Instead the
+extended @code{gettext} interface should be used.
+
+These extra functions are taking instead of the one key string two
+strings and an numerical argument. The idea behind this is that using
+the numerical argument and the first string as a key, the implementation
+can select using rules specified by the translator the right plural
+form. The two string arguments then will be used to provide a return
+value in case no message catalog is found (similar to the normal
+@code{gettext} behaviour). In this case the rules for Germanic language
+is used and it is assumed that the first string argument is the singular
+form, the second the plural form.
+
+This has the consequence that programs without language catalogs can
+display the correct strings only if the program itself is written using
+a Germanic language. This is a limitation but since the GNU C library
+(as well as the GNU @code{gettext} package) are written as part of the
+GNU package and the coding standards for the GNU project require program
+being written in English, this solution nevertheless fulfills its
+purpose.
+
+@comment libintl.h
+@comment GNU
+@deftypefun {char *} ngettext (const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
+The @code{ngettext} function is similar to the @code{gettext} function
+as it finds the message catalogs in the same way. But it takes two
+extra arguments. The @var{msgid1} parameter must contain the singular
+form of the string to be converted. It is also used as the key for the
+search in the catalog. The @var{msgid2} parameter is the plural form.
+The parameter @var{n} is used to determine the plural form. If no
+message catalog is found @var{msgid1} is returned if @code{n == 1},
+otherwise @code{msgid2}.
+
+An example for the us of this function is:
+
+@smallexample
+ printf (ngettext ("%d file removed", "%d files removed", n), n);
+@end smallexample
+
+Please note that the numeric value @var{n} has to be passed to the
+@code{printf} function as well. It is not sufficient to pass it only to
+@code{ngettext}.
+@end deftypefun
+
+@comment libintl.h
+@comment GNU
+@deftypefun {char *} dngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
+The @code{dngettext} is similar to the @code{dgettext} function in the
+way the message catalog is selected. The difference is that it takes
+two extra parameter to provide the correct plural form. These two
+parameters are handled in the same way @code{ngettext} handles them.
+@end deftypefun
+
+@comment libintl.h
+@comment GNU
+@deftypefun {char *} dcngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}, int @var{category})
+The @code{dcngettext} is similar to the @code{dcgettext} function in the
+way the message catalog is selected. The difference is that it takes
+two extra parameter to provide the correct plural form. These two
+parameters are handled in the same way @code{ngettext} handles them.
+@end deftypefun
+
+@subsubheading The problem of plural forms
+
+A description of the problem can be found at the beginning of the last
+section. Now there is the question how to solve it. Without the input
+of linguists (which was not available) it was not possible to determine
+whether there are only a few different forms in which plural forms are
+formed or whether the number can increase with every new supported
+language.
+
+Therefore the solution implemented is to allow the translator to specify
+the rules of how to select the plural form. Since the formula varies
+with every language this is the only viable solution except for
+harcoding the information in the code (which still would require the
+possibility of extensionsto not prevent the use of new languages). The
+details are explained in the GNU @code{gettext} manual. Here only a a
+bit of information is provided.
+
+The information about the plural form selection has to be stored in the
+header entry (the one with the empty (@code{msgid} string). There shoud
+be something like:
+
+@smallexample
+ nplurals=2; plural=n == 1 ? 0 : 1
+@end smallexample
+
+The @code{nplurals} value must be a decimal number which specifies how
+many different plural forms exist for this language. The string
+following @code{plural} is an expression which is using the C language
+syntax. Exceptions are that no negative number are allowed, numbers
+must be decimal, and the only variable allowed is @code{n}. This
+expression will be evaluated whenever one of the functions
+@code{ngettext}, @code{dngettext}, or @code{dcngettext} is called. The
+numeric value passed to these functions is then substituted for all uses
+of the variable @code{n} in the expression. The resulting value then
+must be greater or equal to zero and smaller than the value given as the
+value of @code{nplurals}.
+
+@noindent
+The following rules are known at this point. The language with families
+are listed. But this does not necessarily mean the information can be
+generalized for the whole family (as can be easily seen in the table
+below).@footnote{Additions are welcome. Send appropriate information to
+@email{bug-glibc-manual@@gnu.org}.}
+
+@table @asis
+@item Only one form:
+Some languages only require one single form. There is no distinction
+between the singular and plural form. And appropriate header entry
+would look like this:
+
+@smallexample
+nplurals=1; plural=0
+@end smallexample
+
+@noindent
+Languages with this property include:
+
+@table @asis
+@item Finno-Ugric family
+Hungarian
+@item Asian family
+Japanese
+@item Turkic/Altaic family
+Turkish
+@end table
+
+@item Two forms, singular used for one only
+This is the form used in most existing programs sine it is what English
+is using. A header entry would look like this:
+
+@smallexample
+nplurals=2; plural=n != 1
+@end smallexample
+
+(Note: this uses the feature of C expressions that boolean expressions
+have to value zero or one.)
+
+@noindent
+Languages with this property include:
+
+@table @asis
+@item Germanic family
+Danish, Dutch, English, German, Norwegian, Swedish
+@item Finno-Ugric family
+Finnish
+@item Latin/Greek family
+Greek
+@item Semitic family
+Hebrew
+@item Romance family
+Italian, Spanish
+@item Artificial
+Esperanto
+@end table
+
+@item Two forms, singular used for zero and one
+Exceptional case in the language family. The header entry would be:
+
+@smallexample
+nplurals=2; plural=n>1
+@end smallexample
+
+@noindent
+Languages with this property include:
+
+@table @asis
+@item Romanic family
+French
+@end table
+
+@item Three forms, special cases for one and two
+The header entry would be:
+
+@smallexample
+nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2
+@end smallexample
+
+@noindent
+Languages with this property include:
+
+@table @asis
+@item Celtic
+Gaeilge
+@end table
+
+@item Three forms, special case for one and all numbers ending in 2, 3, or 4
+The header entry would look like this:
+
+@smallexample
+nplurals=3; plural=n==1 ? 0 : n%10>=2 && n%10<=4 ? 1 : 2
+@end smallexample
+
+@noindent
+Languages with this property include:
+
+@table @asis
+@item Slavic family
+Russian
+@end table
+
+@item Three forms, special case for one and some numbers ending in 2, 3, or 4
+The header entry would look like this:
+
+@smallexample
+nplurals=3; plural=n==1 ? 0 : \
+ n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2
+@end smallexample
+
+(Continuation in the next line is possible.)
+
+@noindent
+Languages with this property include:
+
+@table @asis
+@item Slavic family
+Polish
+@end table
+
+@item Four forms, special case for one and all numbers ending in 2, 3, or 4
+The header entry would look like this:
+
+@smallexample
+nplurals=4; plural=n==1 ? 0 : n%10==2 ? 1 : n==3 || n+=4 ? 2 : 3
+@end smallexample
+
+@noindent
+Languages with this property include:
+
+@table @asis
+@item Slavic family
+Slovenian
+@end table
+@end table
+
+
@node Using gettextized software
@subsubsection User influence on @code{gettext}