Blog Transifex - Localization and Translation Management Tool

How to Localize Plurals in JavaScript for Global Audiences

Written by Siddharth Sharma | Jan 18, 2024 12:00:00 AM

Imagine handling words in apps that speak many languages—it sounds simple, right?

While in English, it's easy to add an "s" for plural, like turning "house" into "houses." But when you're dealing with languages like Russian or Arabic, it's a whole different story.

Each language has its own rules for turning things into their plural form, making what seems easy a bit tricky when it comes to localization and translation.

This guide explores how you can leverage Transifex to simplify the localization of these plural strings for JavaScript apps, ensuring seamless user experience worldwide.

Dealing with Different Plural Forms

While English has straightforward singular and plural forms, exploring other languages reveals a more nuanced approach. Chinese and Japanese, for instance, adhere to a single plural form, while Russian adds a layer of complexity with its four plural forms—distinct from the familiar dualistic structure.

Here’s an example of the message “We’ve built 1000 houses until now”:

In English, where the dual forms suffice, we create two versions of the message:

one: "We've built 1 house until now."
other: "We've built 1000 houses until now."

Now, consider Russian with its four plural forms. To ensure accurate translations for our message, we must create four unique versions:

  • one: "Мы построили 1 дом до настоящего момента."
  • few: "Мы построили 3 дома до настоящего момента."
  • many: "Мы построили 11 домов до настоящего момента."
  • other: "Мы построили 1000 домов до настоящего момента."

Hence, the challenge expands beyond the binary singular-plural choice. There is a need for a versatile solution capable of selecting the correct plural form for any given language, acknowledging the unique pluralization dynamics each language introduces.

How Transifex Handles Your Pluralised Strings

Transifex leverages the ISO standard and Common Locale Data Repository (CLDR) Language Plural Rules as a foundational element of its pluralization strategy. 

The CLDR rules include the various forms a string can take based on numeric values, accommodating the diverse pluralization structures across languages.

For instance, languages like Russian, with four plural forms, or Arabic, with six, have distinct rules. Transifex taps into the CLDR to interpret and apply these rules accurately during the translation process.

In addition to CLDR, Transifex supports pluralized entries in JSON files, adhering to the International Components for Unicode (ICU) message format specifications. 

ICU provides a standardized way to handle complex message formatting, including pluralization, in a language-neutral manner.

Here are the Supported Language Plural Forms in Transifex

Transifex follows ISO Standards and Unicode CLDR data for the supported languages.

  • French (all languages with French as base language)
    • Forms in Transifex: one, many, other
  • Spanish (all languages with Spanish as base language)
    • Forms in Transifex: one, many, other
  • Italian (all languages with Italian as base language)
    • Forms in Transifex: one, many, other
  • Portuguese (all languages with Portuguese as base language)
    • forms in Transifex: one, many, other
  • Santali
    • forms in Transifex: one, two, other
  • Cornish
    • forms in Transifex: zero, one, two, few, many, other

Localize Plurals with the File-Based Approach

Transifex supports plurals for all file formats that support them, such as Android, Apple strings, Apple stringsdict, Gettext, Java, JSON with ICU plurals, Structured JSON, XLIFF, and YAML.

Transifex also supports a part of the ICU MessageFormat under JSON, Structured JSON, Fileless, and Java Properties. If you have content encoded in ICU, you can use one of the above types to import and export it in Transifex.

For JavaScript apps, the file-based approach in Transifex involves managing translation files, primarily in JSON format, to effectively handle plural strings. Let’s discuss this in further detail.

JSON Files

Transifex provides robust support for JSON files, allowing seamless integration with JavaScript applications using the ICU (International Components for Unicode) message format for plurals.

Let’s take the previous example of “We’ve built 1000 houses until now.” to understand this better:

English File:

{

  "message": "We've built {count, plural, one {1 house} other until now."

}

Russian File:

{

  "message": "Мы построили {count, plural, one {1 дом} few до настоящего момента."

}

How to Handle Special Characters?

Here’s a list of escape behavior to calculate the hash of JSON strings accurately:

 

How do You Handle Nested JSON Files?

Nested JSON structures involve hierarchies where values can be objects or arrays. In the context of localization, this means messages may be organized within nested levels, like categories or sections.

For instance, in the following Nested JSON Structure:

{

"Colours": ["Red", "Blue", "Green", "Yellow"]

 "home": { "message": "We've built {count, plural, one {1 house} other until now." }

}

We use a form of notation to mark each string's location:

. = a JSON nest, e.g., "home.message"

 

 ..N.. = the Nth item within a list, e.g., "Colours..0.."

 Using this notation, we can represent the total path in a single string. 

If your JSON file is nested, calculate the string's path, including its nested notation. This path ensures translators can accurately identify and work on specific strings within the nested structure.

Use the following algorithm (Python) to implement this:

from hashlib import md5

def escape(key):

    key = key.replace('\', r'\\')

    return key.replace('.', r'\.')

def generate_hashes_with_strings(nest_value, nest_key='', order=0):

    # Are we now looking at a list or a dict?

    if isinstance(nest_value, dict):

        iter_tuple = nest_value.iteritems()

        in_list = False

    else:

        iter_tuple = enumerate(nest_value)

        in_list = True

    # Loop through each element and re-call this function

    # if it's a list or a dict.

    for key, value in iter_tuple:

        if not in_list:

            escaped_key = escape(key)

        else:

            escaped_key = u'..{}..'.format(key)

        if isinstance(value, dict):

            new_nest = '{}{}{}'.format(nest_key, escaped_key, '.')

            for key, value in generate_hashes_with_strings(value, new_nest, order):

                yield key, value

        elif isinstance(value, list):

            new_nest = '{}{}'.format(nest_key, escaped_key)

            for key, value in generate_hashes_with_strings(value, new_nest, order):

                yield key, value

        else:

            entity_key = u'{}{}'.format(nest_key, escaped_key)

            keys = [entity_key, '']

            hashed_keys = md5(':'.join(keys).encode('utf-8')).hexdigest()

            yield hashed_keys, value

        order += 1

 

Note that you must escape all \ and . characters before calculating the hash of a JSON string.

Structured JSON Files

Unlike traditional JSON format, Structured JSON provides the flexibility to include metadata information such as context, developer_comment, and character_limit. These additions provide valuable insights and guidance during the localization process, enhancing collaboration between developers and translators.

To start localizing, you can use the JSON-based format for the Structured JSON. For plural entries, you need the following two fields:

#Key for your String

"Provide_the_key":{

#Translable text

"string": "text you want to translate"}

So, the same format is applicable for Structured JSON Files, e.g.,:

{

  "message": "We've built {count, plural, one {1 house} other until now."

}

 

For pluralized strings that follow a different structure, for example:

 

<code class="language-json">"message": "We’ve built {number, plural, =1 {1 New}, =2