{translated} is a complex internationalization system made easy. Provide a directory where localization JSON files are stored and access entries from any place in your code. Various features are incorporated: recursive string interpolation, custom plural form rules, entry grouping, inheriting localization data from other dialects (see differences between British and American English for a canonical example), and more.
# Install from CRAN:
install.packages("translated")
# Install development version from GitHub:
# install.packages("devtools")
::install_github("ttscience/translated") devtools
To start using {translated} in your project, specify where to find localization JSON files:
library(translated)
# Below is the path to examples shipped with this package
<- system.file("examples", package = "translated")
path trans_path(path)
Processing localization data is done behind the curtains. The user can focus on using the entries:
trans("title")
#> [1] "Predefined number generator"
This was the localization for the default locale,
i.e. en_US
(United States English). The currently set
locale can be checked with:
trans_locale()
#> [1] "en_US"
To set a different locale (e.g. pl_PL
or pl
for short, my native language), simply pass it as an argument to the
same function. Both mentioned forms are acceptable (plus the form with
encoding included, for example pl_PL.UTF-8
), so we’ll use
the simpler one:
trans_locale("pl")
The localization immediately changes:
trans("title")
#> [1] "Generator liczb predefiniowanych"
To list all currently available locales, call:
trans_available()
#> [1] "en_UK" "en_US" "pl"
If more than one key is supplied, an equal amount of translations is returned. It makes it easier to translate vectors of strings, e.g. column names. Works with all other features, too.
trans(c("btn_close", "title"))
#> [1] "Zamknij" "Generator liczb predefiniowanych"
Some entries can have gaps to fill with variable values. They are
denoted with braces {}
inside translation text,
e.g. "Courtney is {age} years old."
. Pass these variables
as named parameters to trans()
function (and don’t worry,
unused parameters are ignored). Most often they’ll be strings, but
anything coercible to string is valid, especially numbers:
# JSON entry
"btn_insert": "Wstaw {number}"
trans("btn_insert", number = 4)
#> [1] "Wstaw 4"
It is up to the user to provide singular and plural forms for each
entry where it’s necessary. {translated} provides the user with an easy
method to make it work: all forms for an entry are stored in a list.
With an appropriate number-to-form converter, the user only has to
supply the number as .n
parameter.
To see details on defining rules of plural forms, see “Plurality rule definition” section.
# JSON entry
"cat": ["brak kotów", "{.n} kot", "{.n} koty", "{.n} kotów"]
trans("cat", .n = 5)
#> [1] "5 kotów"
trans("cat", .n = 1)
#> [1] "1 kot"
It can be difficult to keep track of all the entries in a large JSON file, so a grouping system comes in handy. Say, for example, that you’d created a large Shiny app with multiple modules and now you’d want it internationalized. You’d group your localization entries by module so that you find entries quicker and don’t have to worry about name clashes.
To access an entry within a group, use a "group.key"
string with dots dividing parts of the path. There is no limit to how
deep the grouping may go, so the key may as well look like
"group1.group2.group3.key"
.
And remember – never use a key with a dot, as the interpreter cannot distinguish between the two. But the easy grouping functionality is worth this slight inconvenience.
# JSON entry
"nouns": {
"behavior": "zachowanie"
}
trans("nouns.behavior")
#> [1] "zachowanie"
As it turns out, {glue} package offers much
more flexibility than just inserting predefined variables – it can
execute arbitrary code too. This isn’t too helpful on its own; you may
as well pass the result of this code as a named parameter to
trans()
function. However, this has an interesting effect
regarding nested translations.
See, it’s a common problem that a phrase may contain more than one
noun dependent on its count. You may try to cover all possible cases,
but their number grows exponentially. What you can do instead is to
split the processing logic between multiple entries, each having one
count-dependent part at most, then compound these entries using
trans()
function inside another entry. See the example
below; however, note that this is just one of the possible solutions,
perhaps not even optimal.
# JSON entry
"result": "Przeskanowałam {trans('file', .n = n_files)} w {trans('dir', .n = n_dirs)}.",
"file": ["{.n} plików", "{.n} plik", "{.n} pliki", "{.n} plików"],
"dir": ["żadnym folderze", "{.n} folderze", "{.n} folderach", "{.n} folderach"]
trans("result", n_files = 4, n_dirs = 1)
#> [1] "Przeskanowałam 4 pliki w 1 folderze."
If you have an idea for a feature that is missing from {translated}, please start an issue on our GitHub repository. Pull requests are obviously welcome as well.
Localization is to be stored in JSON files, all inside one folder. Each JSON may only hold data for one language 1, yet more than one JSON may be used for a language 2. Files do not have to be in the same folder, subdirectories are allowed (so the user may store all data for a language in its own folder, for example).
Each JSON identifies its belonging by a required component:
locale
stored under config
. The other required
component is translation
map, although there is no limit to
how many entries there must be. Furthermore, each locale should contain
plurality case assignment as plural
stored under
config
, although only one file per locale must contain
it.
There are two optional config
fields and both do not
need to be repeated as well: inherit
and
default
. The former enables inheriting translation and
plurality data from other locales, making it easier to translate
dialectal differences. The latter sets the locale as the default for its
language (e.g. in the example below, American English is used as the
default version of English).
To sum it up, an example JSON structure is shown below:
{
"config": {
"locale": "en_US",
"plural": "n == 0 ~ 1, n == 1 ~ 2, TRUE ~ 3",
"inherit": "en_UK",
"default": true
},
"translation": {
"key": "value",
"plural_key": ["case_1", "case_2", "case_3"],
"group": {
"key2": "value2"
}
}
}
A plurality rule consists of a set of sequentially-evaluated cases.
They are separated by commas (,
). Each rule has two
components: condition and value. Should the condition evaluate to
TRUE
, its value is returned, else the next rule is
tested.
Values are placed to the right of their conditions, separated by a
tilde (~
), meaning that conditions should not use tilde in
their code. This should not be a problem, however, as only a few simple
operations should suffice to create an appropriate set of rules for any
language.
A condition has access to one variable: n
. This is the
count that influences the phrase form. After evaluation, a condition
should return a single logical value, either TRUE
or
FALSE
.
Now that we’ve discussed rule structure, let’s break down two examples, starting with the one from above JSON:
"n == 0 ~ 1, n == 1 ~ 2, TRUE ~ 3"
There are three cases in this example. The first case matches if
n
is equal to 0, returning 1. The second matches with 1
(for a singular form), returning 2. The last case is a “catch-all”,
returning 3 for all inputs that did not match previous cases (i.e. any
number other than 0 or 1).
Analogous rule for Polish is much more complicated:
"n == 0 ~ 1, n == 1 ~ 2, n %% 100 %in% 12:14 ~ 4, n %% 10 %in% 2:4 ~ 3, TRUE ~ 4"
Oof, this is more than double the length of English rule. But it’s not that difficult in its essence.
First two cases are already known to you, they match 0s and 1s.
There’s also a “catch-all” at the end, returning 4. And the part
inbetween? It returns 3 for all numbers returning 2, 3 or 4 modulo 10
except 3 numbers that return 12, 13 or 14
modulo 100. That modulo operator %%
and inclusion operator
%is%
are the other two key operations.
Of course, any logical operator could be applied as well, so instead of writing
n %% 100 %in% 12:14 ~ 4, n %% 10 %in% 2:4 ~ 3
,
one could write
(n %% 10 %in% 2:4) && !(n %% 100 %in% 12:14) ~ 3
.
However, it is best to stick to the simplest statements possible.
{translated} is meant to be lightweight, so only the key packages are imported:
Storing data in separate files for each language is in line with best practices. Usually there’ll be a different translator assigned for each language and having the data split between files gets rid of the need for merging files.↩︎
Especially valuable in huge projects with a lot of text, as it enables grouping data in files by topic or module.↩︎
“Except” means that the following case should precede the previous one.↩︎