Imagine handling words in apps that speak many languages—it sounds simple, right?
While in English, it's easy to add an "s" for plural, like turning "house" into "houses." But when you're dealing with languages like Russian or Arabic, it's a whole different story.
Each language has its own rules for turning things into their plural form, making what seems easy a bit tricky when it comes to localization and translation.
This guide explores how you can leverage Transifex to simplify the localization of these plural strings for JavaScript apps, ensuring seamless user experience worldwide.
While English has straightforward singular and plural forms, exploring other languages reveals a more nuanced approach. Chinese and Japanese, for instance, adhere to a single plural form, while Russian adds a layer of complexity with its four plural forms—distinct from the familiar dualistic structure.
Here’s an example of the message “We’ve built 1000 houses until now”:
In English, where the dual forms suffice, we create two versions of the message:
one: "We've built 1 house until now."
other: "We've built 1000 houses until now."
Now, consider Russian with its four plural forms. To ensure accurate translations for our message, we must create four unique versions:
Hence, the challenge expands beyond the binary singular-plural choice. There is a need for a versatile solution capable of selecting the correct plural form for any given language, acknowledging the unique pluralization dynamics each language introduces.
Transifex leverages the ISO standard and Common Locale Data Repository (CLDR) Language Plural Rules as a foundational element of its pluralization strategy.
The CLDR rules include the various forms a string can take based on numeric values, accommodating the diverse pluralization structures across languages.
For instance, languages like Russian, with four plural forms, or Arabic, with six, have distinct rules. Transifex taps into the CLDR to interpret and apply these rules accurately during the translation process.
In addition to CLDR, Transifex supports pluralized entries in JSON files, adhering to the International Components for Unicode (ICU) message format specifications.
ICU provides a standardized way to handle complex message formatting, including pluralization, in a language-neutral manner.
Transifex follows ISO Standards and Unicode CLDR data for the supported languages.
Transifex supports plurals for all file formats that support them, such as Android, Apple strings, Apple stringsdict, Gettext, Java, JSON with ICU plurals, Structured JSON, XLIFF, and YAML.
Transifex also supports a part of the ICU MessageFormat under JSON, Structured JSON, Fileless, and Java Properties. If you have content encoded in ICU, you can use one of the above types to import and export it in Transifex.
For JavaScript apps, the file-based approach in Transifex involves managing translation files, primarily in JSON format, to effectively handle plural strings. Let’s discuss this in further detail.
Transifex provides robust support for JSON files, allowing seamless integration with JavaScript applications using the ICU (International Components for Unicode) message format for plurals.
Let’s take the previous example of “We’ve built 1000 houses until now.” to understand this better:
English File:
{
"message": "We've built {count, plural, one {1 house} other until now."
}
Russian File:
{
"message": "Мы построили {count, plural, one {1 дом} few до настоящего момента."
}
Here’s a list of escape behavior to calculate the hash of JSON strings accurately:
Nested JSON structures involve hierarchies where values can be objects or arrays. In the context of localization, this means messages may be organized within nested levels, like categories or sections.
For instance, in the following Nested JSON Structure:
{
"Colours": ["Red", "Blue", "Green", "Yellow"]
"home": { "message": "We've built {count, plural, one {1 house} other until now." }
}
We use a form of notation to mark each string's location:
. = a JSON nest, e.g., "home.message"
..N.. = the Nth item within a list, e.g., "Colours..0.."
Using this notation, we can represent the total path in a single string.
If your JSON file is nested, calculate the string's path, including its nested notation. This path ensures translators can accurately identify and work on specific strings within the nested structure.
Use the following algorithm (Python) to implement this:
from hashlib import md5
def escape(key):
key = key.replace('\', r'\\')
return key.replace('.', r'\.')
def generate_hashes_with_strings(nest_value, nest_key='', order=0):
# Are we now looking at a list or a dict?
if isinstance(nest_value, dict):
iter_tuple = nest_value.iteritems()
in_list = False
else:
iter_tuple = enumerate(nest_value)
in_list = True
# Loop through each element and re-call this function
# if it's a list or a dict.
for key, value in iter_tuple:
if not in_list:
escaped_key = escape(key)
else:
escaped_key = u'..{}..'.format(key)
if isinstance(value, dict):
new_nest = '{}{}{}'.format(nest_key, escaped_key, '.')
for key, value in generate_hashes_with_strings(value, new_nest, order):
yield key, value
elif isinstance(value, list):
new_nest = '{}{}'.format(nest_key, escaped_key)
for key, value in generate_hashes_with_strings(value, new_nest, order):
yield key, value
else:
entity_key = u'{}{}'.format(nest_key, escaped_key)
keys = [entity_key, '']
hashed_keys = md5(':'.join(keys).encode('utf-8')).hexdigest()
yield hashed_keys, value
order += 1
Note that you must escape all \ and . characters before calculating the hash of a JSON string.
Unlike traditional JSON format, Structured JSON provides the flexibility to include metadata information such as context, developer_comment, and character_limit. These additions provide valuable insights and guidance during the localization process, enhancing collaboration between developers and translators.
To start localizing, you can use the JSON-based format for the Structured JSON. For plural entries, you need the following two fields:
#Key for your String
"Provide_the_key":{
#Translable text
"string": "text you want to translate"}
So, the same format is applicable for Structured JSON Files, e.g.,:
{
"message": "We've built {count, plural, one {1 house} other until now."
}
For pluralized strings that follow a different structure, for example:
<code class="language-json">"message": "We’ve built {number, plural, =1 {1 New}, =2