Eating Your Own Dog Food: Managing a Continuous Localization Project

transifex
October 31, 2013
12 min read

eat your own dog food

In this post we’ll describe how we, the Transifex team, use Transifex itself to localize our own product. Though we focus on our specific workflow, our hope is that you would become more familiar with handling localization workflows when using the tools and frameworks we do (namely gettext, Django, Git). That said, the same principles and procedures should work with any tool or framework, and even non-software products as well.

Note: Since we’re talking about translating Transifex with Transifex, things might get a little confusing whenever the word ‘Transifex’ appears (Transifex!). This is the fun part, however. It’s what “eating your own dogfood” means after all. Try not to get confused.

Some preliminary basics

Gettext

Gettext is one of the most popular open-source libraries for software localization. It allows developers to mark strings in the code as ‘translatable’.

In Python, it is done like this:

from gettext import gettext as _
translatable_string = _(u'A sentence to translate.')

This has the following effects:

  1. It allows the ‘gettext‘ command-line program to find these strings in your code and extract them into a .po plaintext file. This file, known as a ‘source language file’, can be given to translators to produce additional language files — one for each language.
  2. Once language files for several languages are available in the code directory structure, the ‘msgfmt’ command compiles them into binary files (which use the ‘.mo’ extension). When the actual product renders the strings that were marked as translatable with the ‘gettext‘ call, it uses the information in the compiled files, cross-checks with the current language selection of the program, and displays the appropriate translated version.

Gettext in Django

Django is the web development framework that we use to develop Transifex (while Django developers use Transifex for localization management and crowdsourcing). It provides several wrappers around the gettext calls that are more convenient for web applications. Specifically,

For marking translatable strings in the code:

from django.utils.translation import ugettext_lazy as _
translatable_string = _(u'A sentence to translate.')

For html templates:

{% load i18n %}
<h1>{% trans "A header to translate" %}</h1>
{% bloctrans count ppl|length as num %}
Person:
{% plural %}
People ({{ num }}):
{% endbloctrans %}

To create the source language file (from the command-line at the base directory of the project):

python manage.py makemessages --locale=en

To compile language files:

python manage.py compilemessages --all

Django keeps a settings variable called “LANGUAGES”. This is a list of all the languages that the product can be displayed in and that will be compiled when calling ‘compilemessages’. Transifex sets this variable by reading a plaintext file in ‘locale/LINGUAS’, which contains a simple list of those target languages.

Finally, Django provides functionality that helps users switch languages when using the product (e.g. the language selection dropdown at https://www.transifex.com).

Version control

We use Git, a distributed version control system, for maintaining the state of the product’s code through time (a.k.a. version control) and for distributing the code among the development team. If you’re not familiar with version control, here is a brief overview:

Imagine your product’s code as a chain of states in time. These states, or ‘revisions’, are maintained through a repository on the developer’s computer. To begin working on the code, a developer copies the latest revision to the ‘working directory’ on his computer.

After working for some time, the developer adds the changes to the version control system and issues the ‘commit’ command. This command compares the current working directory with the previous version and generates the difference between the two — the changeset. The changeset is then committed into the repository along with the new state.

As you read further, you will only encounter three commands:

  1. git checkout’, which copies a specific revision to the working directory,
  2. git add’, which adds a file under version control and
  3. git commit’, which commits a new changeset-revision to your repository.

The advantages of working with a distributed version control system are many and are not the subject of this guide.

The Transifex command-line client

The Transifex-client is a command-line tool we created to help developers sync their content with the content on Transifex. In general, this means pushing source language files and pulling translation files.

In order to use this command, navigate to your project’s directory and call ‘tx init’. This initializes your connection settings with the Transifex server, which are set on an operating system profile-level (in Unix-based systems, this means that they are stored in ~/.transifexrc), and creates a ‘.tx’ directory in your project’s directory that stores your current project’s localization settings.

Then you call ‘tx set‘ along with some options that set the association between your project’s language files and the resources that are set up on the Transifex server. These settings are stored in your ‘.tx’ folder for the current project.

Now it’s a simple matter of running ‘tx push‘ and ‘tx pull‘ to sync your files with Transifex.

One of the neat things you can do is commit the .tx folder under version control so other developers can use the same sync settings as you without running the ‘tx init‘ and ‘tx set‘ commands themselves.

Here’s how the Transifex-client is configured for Transifex itself (this will work for most Django projects on Git):

# install client
pip install transifex-client

cd <path to Transifex project>

# We assume the Transifex project has been created in Transifex
# (https://www.transifex.com/transifex/transifex/)

tx init # you may be asked for your Transifex credentials
tx set --auto-local
  --resource=transifex.txc
  --source-language=en
  --type=PO
  'transifex/locale/<lang>/LC_MESSAGES/django.po'
  --execute

# details on how to use the 'tx set' command can be found with 'tx help set'
# or at https://docs.transifex.com/client/config

# (optional) add sync configuration under version control
 git add .tx/
git commit --message="added .tx configuration directory under version control"

Localization workflow

Most software products are distributed incrementally in releases, with release dates and upcoming features announced in advance. Before the release, developers make sure the product is bug-free and well documented. And if the product is to be offered in other languages, teams will announce a ‘string freeze’ on the product.

For developers, this means they can no longer change the code in a way that affects the source strings (bug fixes and improvements, however, are acceptable). For translators, the string freeze provides them adequate time to work.

When the release day arrives, developers get the translations for their target languages, compile them into the product, and release it.

How Transifex is actually used

Here’s the gist. Try to make what you can out of it as an exercise, however, I will explain each step in detail afterwards:

# String freeze date:
# -------------------

git checkout master
tx pull --all
git commit --message="stored past translations under version control"
python manage.py makemessages --locale=en
git add locale
git commit --message="updated source language files"
tx push --source

# Release date:
# -------------

tx pull --all
gedit locale/LINGUAS # choose sufficiently translated languages
python manage.py compilemessages --all
git add locale
git commit --message="updated target language files (translations)"

Let’s break this whole thing down:

String freeze

‘git checkout master’:

Copy the latest revision of the repository to the working directory.

‘tx pull —all’
and
‘git commit —message=”stored past translations under version control”’:

Before re-creating the source language files for Transifex, I pull all translations from the Transifex server to my working directory and commit them to the repository. This step is optional and mainly serves to preserve translations for future use and/or backup. Transifex employs a Translation Memory feature that provides powerful restoring functionality wherever needed (more on this later). However, it never hurts to have an extra backup.

‘python manage.py makemessages —locale=en’:

Extract all translatable strings from my code (marked with gettext) and create the source language file which is stored in ‘locale/en/django.po’.

‘git add locale’
and
‘git commit —message=”updated source language files”’:

Commit the updated source language file under version control.

‘tx push —source’:

Push the newly updated source language file to the Transifex project on Transifex.

Release

‘tx pull —all’:

Get all available target language files from Transifex and put them in the appropriate place.

‘gedit locale/LINGUAS’:

Choose which languages to include when compiling translations. For example, our principle for Transifex is to include all languages that are 90% translated or above. The completion percentage can be viewed on the project overview page on Transifex (https://www.transifex.net/projects/p/transifex/).

‘python manage.py compilemessages —all’:

Compile all languages indicated in the LINGUAS file in the final product.

‘git add locale’
and
‘git commit —message=”updated target language files (translations)”’:

Commit the translations to the repository.

The automated nature of this process means you can have as frequent release cycles as you like.

Now that you know how to employ the most automated localization process out there, here are some advanced tips:

What happens when you ‘tx push’
(and a few notes around Translation Memory)

A source language file is essentially a collection of source strings. This is reflected in the way resources are stored in Transifex’s database: when you push a source language file on top of an existing resource on Transifex, it finds the subset of common source strings in both the source language file and on Transifex. For these strings, it keeps all translations for all languages as they are. Source strings that were on Transifex but not in the updated source language file however are not simply deleted, but:

  • Firstly, they are stored in the project’s Translation Memory. Actually, all translations ever submitted to a project on Transifex are saved in the project’s Translation Memory, and this happens automatically immediately after they’re submitted.
  • Secondly, they are compared to the new strings imported from the updated source language file. If there is a high-enough similarity between the old and new source strings, Transifex doesn’t consider there to be a ‘delete’ action on the old string and an ‘insert’ action on the new string, but an ‘update’ action on the pair altogether.

The Translation Memory finds occurrences between two source strings and presents existing translations on one source string as automated suggestions for the other.

So, when Transifex tries to insert a new source string to a resource, one of the following three outcomes will happen:

  1. If a 100% match from the Translation Memory occurs and the relevant setting is set in the project’s admin panel, Transifex will automatically use the entries available in Translation Memory as translations for all available languages.
  2. If Transifex considers this insertion as an update, it uses all existing translations as suggestions for that specific string. This means if existing translations are still appropriate, they will be able to be used by translators with a single mouse-click.
  3. Otherwise, Transifex simply inserts this new string with no available translations or suggestions.

Restoring Transifex resources

Many products have major releases several times a year and developers want to include translations to as many languages as possible in each release. However, it’s not always practical to invest the effort/time/money to do so straight away. In some cases, you may have started on the next release only to realize that you want to increase the number of target languages in a past release.

Suppose you issued a stable release on January 1st called ‘1.7’ and 5 target languages made it into this release. On January 8th you generate a new source language file from your development version and push it to Transifex because you need translators working on your next release as soon as possible.

In February, due to the incredible success of your 1.7 release, you realize you can employ many more target languages for your product by your professional translators or even from you community. Since your development version and the 1.7 release share a lot of texts content, it would be perfect if you could apply some of the new strings/languages you have available to it. The proper approach is to issue a ‘maintenance release’ called ‘1.7.1’ which includes more and fuller target languages. However, you run into a roadblock: the resource file on Transifex is produced by a different source language file than the one present in the 1.7 stable release.

Transifex can solve this issue automatically. All we need to do is run::

# do the maintenance release
# --------------------------

git checkout 1.7 # '1.7' is a tag that pinpoints the stable release's revision
python manage.py makemessages --locale=en
tx push --source
tx pull --all # yes, issue this immediately
gedit locale/LINGUAS # set new languages to be compiled
python manage.py compilemessages --all
git add locale
git commit --message="maintenance release: added more translated content"
git tag 1.7.1

# set things back to normal
# -------------------------

git checkout master
python manage.py makemessages --locale=en
tx push --source

Note: In case you noticed and are confused, the ‘git commit’ command will create a new ‘head’ on your repository. This is an advanced feature of distributed version control systems you needn’t worry about for now.

Notice how no time has passed between the ‘tx push’ and ‘tx pull’ commands for translators to work. Yet new translations will emerge for your project. During the ‘tx push’ command, Transifex uses its “magic” to use newly available translations for the source strings.

By pushing an older version of your source language file on an existing resource on Transifex, you can restore your resource to a previous point in time. Actually, you do more than that. Not only do you restore old translations, but you also add new translations to your old source strings!

TRANSIFEX
Try AI-Translation for Free!
Translate any webpage in just minutes with Transifex AI and receive a detailed content quality report.
TRY IT NOW
transifex
FacebookgithubGoogle+Fill 88Twitter