Using Transifex with Google Drive

Combining Transifex with Google Drive gives you the ability to edit and share your project files with your entire team. In addition to managing your project with the Transifex client, you can use our auto-update function to update resource files without the need for a third-party service.

Using the Auto-Update Function

Transifex offers an auto-update function that periodically scans a remote file and updates your resources based on changes to that file. The only requirement is that the remote file has to be publicly accessible. To do this, we’ll use a feature in Google Drive designed for hosting static web pages. We’ll then create a URL that Transifex can use to access our resource files.

Start by opening your project in Google Drive. Navigate to your project folder, right-click your resource file, and click “Share”. The Share Settings popup window appears: by default, the file is limited to specific users. We need to make the file publicly available so that Transifex can access it. Note that this has the side effect of making your resource file available to anyone on the public Internet.

Click “Get shareable link” at the top-right corner of the window to show the link sharing dropdown. Click “More” under the dropdown, select “Public on the web”, then click “Save.” Your share settings should look similar to the following screenshot:

Google Drive share popup

Using Google Drive’s Host Feature

After clicking Save, Google Drive gives you a link to the file. Accessing the file through this link will open a file preview, which prevents Transifex from correctly parsing the file’s contents. Instead, we’ll use Google Drive’s host feature to access the file’s contents exactly as they are.

A regular Google Drive link looks similar to the one below. The file ID is a unique string of alphanumeric characters that identifies the file:
https://drive.google.com/file/d/[file ID]/view?usp=sharing

Copy the file ID into the following URL:
https://googledrive.com/host/[file ID]

Test the link by opening the URL in a browser. If you can see the contents of the file, then the link works.

Supplying the URL to Transifex

In Transifex, navigate to your project’s resource list and click “Auto update resources.”

Transifex resources

A new dialog box appears. Enter your newly-created URL and click “Update URL.” If Transifex can successfully access the URL, then the URL turns green.

Auto-update source files

Back in Google Drive, update your source file and save the changes. There’s a small delay Transifex polls the URL for changes once a day, so the updates won’t appear immediately. You can disable the auto-update function by removing the resource’s URL.

Handling Changes

When the auto-update function detects changes, it performs one of three actions:

  1. New strings are added to the project as expected.
  2. Modified strings are added to the project as new strings. The old string and its translations are kept in the project’s Translation Memory.
  3. Missing strings are removed from the resource. The string and its translations are kept in the project’s Translation Memory.

Using the Transifex Client

If you choose not to use the auto-update function, you can manually sync your resource files using the Transifex Client. While it’s less automated, the benefit is that you have control over when the Transifex project is updated. You can use Google Drive to share your project with other users, including the project’s configuration files. This way, another user can run the Transifex Client in the project folder without having to first configure the project. Just be careful not to store your .transifexrc file with the project, as it contains your account credentials.

Google Drive share

For more information, read our FAQ on updating source files.

Using Transifex with Dropbox

Dropbox is shaking up the way many businesses share files. The ability to instantly update files across multiple users and devices makes it easy for companies to stay in sync. With Transifex, you can easily update localization files stored in your Dropbox account.

Using the Auto-Update Function

Transifex provides an auto-update function that modifies your project’s source file based on a contents of a URL. Transifex periodically checks the URL and adds or removes strings based on the contents of the file. This file can be hosted anywhere – Dropbox, GitHub, or even a simple web server – as long as it’s publicly accessible.

To use the auto-update function, you’ll need to create a public URL for your Dropbox file. Using either the Dropbox desktop app or website, right-click your source file and click “Share.” Dropbox will automatically generate a public link to the file, which you’ll provide to Transifex.

In Transifex, navigate to your project’s resource list and click “Auto update resources.”

Transifex resources

A new dialog box appears. Enter the URL generated by Dropbox and click “Update URL.” If Transifex can successfully reach the URL, the URL turns green.

Auto-update source files

Over in Dropbox, update your source file and save the changes. Transifex polls the URL for changes once a day, so the updates won’t appear immediately. You can disable the auto-update function by removing the resource’s URL.

Handling Changes

When the auto-update function detects changes, it performs one of three actions:

  1. New strings are added to the project as expected.
  2. Modified strings are added to the project as new strings. The old string and its translations are kept in the project’s Translation Memory.
  3. Missing strings are removed from the resource. The string and its translations are kept in the project’s Translation Memory.

Handling Syntax Errors

In some cases, Transifex will experience a syntax error when trying to parse text files. This is due to Dropbox’s preview feature, as shown in this screenshot:

Dropbox preview

To ensure Transifex receives a plain text copy of the file, append “?raw=1″ to the end of the URL (if the URL already contains a parameter, such as “?dl=0″, use “&raw=1″ instead). For example:

https://www.dropbox.com/s/myproject/en.json?dl=0

becomes:

https://www.dropbox.com/s/myproject/en.json?dl=0&raw=1.

Dropbox raw

Using the Transifex Client

If you need a faster approach, you can use the Transifex Client. While it’s less automated than providing a URL, the benefit is that you have control over when the Transifex project is updated. By storing your project configuration with your project files, you can easily share your project with others as shown in the screenshot below. Just be careful not to store your .transifexrc file with the project, as it contains your account credentials.

Dropbox share

For more information, read our FAQ on updating source files.

How to Use Git to Track Changes in Translation Files

There’s been a lot of buzz about managing translations through version control. As localization becomes more integrated into your projects, it’s important that members of your development and localization teams stay synchronized. In this post, we’ll discuss best practices when using Git to manage your own projects.

What is Git?

Git is a version control system (VCS) designed by Linus Torvalds, the creator of Linux. Git tracks changes to source code for multiple users, allowing teams to easily share the same codebase.

Version control is an extensive topic, but we’ll focus on three key components: repositories, commits, and branches.

Repositories are directories that hold your code files. Repositories also double as workspaces. You can create a new repository on your local machine, or you can copy (clone) a repository from a remote machine and work on it locally.

Commits are changes to source code that are applied to a repository. Commits make it easy to group similar changes for organizing, managing, and auditing tasks.

Branches are deviations in the codebase. They let you create new working copies of code without disrupting the original copy. You can easily switch between branches, and any changes made to a branch can be merged into another branch. Git repositories provide a default master branch that forms the trunk for other branches.

There are many other components to Git including forks, push and pull requests, staging, and tagging. For a more comprehensive overview of Git, see Pro Git, an open source book covering a wide range of Git’s features.

Hosting Git Repositories Online

You can host a Git repository on any computer, but there are websites that provide online repository hosting. GitHub, BitBucket, GitLab, Gitorious, and Kiln all provide hosting for public and private repositories.

Git Best Practices

When working with multiple developers, it’s vital to set a version control strategy that everyone understands and agrees with. This section discusses Git best practices that ensure consistency and provide a reliable trail of changes.

Stay in Sync

A code repository is only effective if it’s kept up-to-date with frequent commits. Commits should consist of small, related changes rather than large, varied changes. For instance, if you fix a typo, you can apply the change in a single commit. However, if you fix a typo and a bug, you should make two commits: one for the typo and one for the bugfix. Small, related commits make it easy to not only track changes, but to roll back changes if necessary. It also helps other developers understand exactly what changed while you were working on the file.

Branch Out

Branching does more than just organize code changes: it lets developers modify the codebase without impacting another user’s workspace. A branch is essentially a snapshot of the base branch at a particular commit. New commits build off of that initial commit, allowing the branch to extend independently of other branches including the one it’s based on.

Imagine your organization is preparing a new software release. While the code is being finalized, one of your developers starts working on a new feature for the next release. Not wanting to lose work, he commits his changes before leaving for the night. Without a good branching policy in place, his commit ends up introducing incompatible features into master and the automated build fails.

If a new branch was created specifically for that feature, the developer could commit his changes to a different codebase than the one used for release. The next section shows commonly used branching techniques. There are several well-tested methods, but one of the more popular is the Gitflow Workflow.

The Gitflow Workflow
Originally proposed by Vincent Driessen, the Gitflow Workflow establishes two main branches: master and develop.Master is limited to production-ready code, while develop contains ongoing development changes. Other branches – features, bugfixes, and releases – are based off of these two core branches.

Gitflow workflow

Example of a repository using the Gitflow Workflow (Atlassian)

When development is started on a new feature, develop branches off into a new branch. Once that feature is complete, it gets merged back into the develop branch. As the product nears release, develop branches into release, which is limited to preparing the code for the upcoming release. In the meantime, changes can still be committed to the other branches. When the release is ready, it’s merged back into develop as well as master.

Besides develop, the only other branch that derives directly from master is hotfix. When a hotfix is released, hotfix is merged back into master and develop in order to incorporate the changes into future releases. This results in a flow where changes are logically separated, but still combined into the main branches upon completion.

Combine Often

Merging changes is critical to ensuring consistency across branches. Once a feature is finalized, its changes need to join the changes submitted by other developers in other branches. This can be accomplished using two methods: merging and rebasing.

Merging vs. Rebasing
Merging and rebasing both solve the problem of synchronizing two branches, but they go about it in very different ways. For instance, imagine you’re working with a repository that has two branches: master and feature. You’ve been assigned to work on new code and commit your changes to feature. In the meantime, other developers are committing changes to master. Once you’ve finished coding, you need to update master with the changes made to feature.

Merging resolves this by creating a new commit in feature containing all of the intermediate commits from master. While this fully preserves both branches, it also generates a lot of extraneous commits, as it needs to incorporate each change from master all the way back to the branch point.

Git merge

Merging the master branch into a feature branch (Atlassian)

Rebasing takes the opposite approach by committing changes from feature into master. Rebasing rewinds to the commit in master that feature is based off of, then reapplies each commit from feature to master. While this reduces the number of commits, it has the effect of essentially rewriting the project’s history.

Git rebase

Rebasing the same feature branch onto master (Atlassian)

Git’s rebase command lets you determine exactly how the branches are integrated. For instance, interactive rebasing will walk you through each feature commit and let you alter the commit before it’s applied to master. For more details, see Merging vs. Rebasing in the Atlassian Git Tutorial.

Synchronizing Git With Transifex

There are several ways to manage code with Git while still managing localizations with Transifex.

Using the Transifex Client with Git

You can simply use the Transifex Client to push source changes to Transifex. The benefit to this approach is that it ensures you have access the latest translations and can push resource updates to the Transifex project.

The drawback is that you’ll need to enforce a policy for pushing resource updates. Features that are still in development can change completely, and having your localizers work on text that might not appear in the final product is a waste of time and money. You will also need to be aware of including localization updates when committing their changes.

Using txgh

Txgh is an Open Source Sinatra server created by Strava. It uses Webhooks to automatically trigger updates in GitHub and Transifex. If a developer commits a change to a source translation file in GitHub, txgh automatically updates the file in Transifex. Additionally, if a translator makes a change in Transifex that updates a file to 100%, txgh creates and pushes a new commit to GitHub. Txgh removes the added step of having to run the Transifex Client each time a translation is added.

There’s No One-Size-Fits-All Solution

Different organizations will prefer different workflows based on their needs and culture. Gitflow is far from the only Git workflow in popular use, and it may not be ideal for your development cycle. You can find a comparison of Git workflows in Atlassian’s Git Tutorial. You can find additional resources for using Git through GitHub’s help site. If you want to learn more about Git, Code School provides an interactive online course.

Translate your Zendesk Knowledge Base with Transifex Sync

Knowledge bases help your customers get answers to questions instantly. It’s key to customer success. To make translating your Zendesk knowledge base easier, we’ve built the Transifex Sync Zendesk app.

Transifex Sync Zendesk translation app

Transifex Sync automatically detects all the articles, sections, and categories in your Zendesk Help Center. With a click, you can upload the content to Transifex and translate it like you would any other content in Transifex. When the translations are done, just download them into Zendesk using Transifex Sync and you’re ready to provide answers to users worldwide. Easy!

Learn more about Transifex Sync here, or check it out in the Zendesk Marketplace.

Translate your Help Scout Docs with Transifex

Help Scout is a popular help desk software with a built-in a knowledge base solution called Docs. If you use Docs, you can now translate it with Transifex Live. Not only can you provide great email support, but you can offer answers to your users 24/7 in their native language too.

Multilingual Help Scout Docs

Translating your Help Scout Docs with Transifex Live is simple. Just add the Transifex Live JavaScript snippet in your site’s settings, save the articles, translate, and publish. Best of all, when you use Transifex Live, you only need to set up one knowledge base to offer it in multiple languages.

Check out our sample Help Scout knowledge base that was translated with Transifex Live, or get started by following the instructions in the documentation.

P.S. If you haven’t heard, we’ve also built a Transifex Live-based WordPress plugin for translating your WordPress site.

Translating Your Node.js App with Transifex

Node.js is a server-side environment for delivering scalable JavaScript apps and services. It’s quickly gaining popularity, having been used by Netflix, PayPal, LinkedIn, and others. In this post, we’ll show you how can use Transifex to easily localize your Node.js apps.

Setting Up Node.js for Localization

While there are several modules for localizing in Node.js, we recommend the i18n-node module. i18n-node stores localization data in standard JSON, making it easy to pass text to and from Transifex.

i18n-node

i18n-node is a simple translation module for integrating localization into a Node.js app. Each locale stores its translations in a separate JSON file, which you’ll use to send localization data to and from Transifex. You can add i18n-node to your project using npm:

$ npm install i18n

Load i18n along with the rest of your modules:

// load modules
var http = require('http'),
      i18n = require('i18n'),
      ...

To configure i18n, specify the locales you want to include in your project and the directory where the localization files will be stored. You can also specify a default locale, a unique extension for newly created localization files, and more. A full list of options is available on the i18n-node GitHub page.

In this example, we’ll store English, Spanish, and German language files in the locales directory under the current directory:

i18n.configure({
	locales:['en', 'es', 'de'],
	directory: __dirname + '/locales',
	defaultLocale: 'en',
	extension: '.json'
});

i18n-node supports both shorthand locale codes and IETF language tags. For example, to specify British English, change “en” to “en_GB”.

You can reference a localized string by using the global i18n keyword. For example, if you have a variable that stores the string “Hello”, you can replace it with a localized string using i18n.__(keyword):

// Hard coded string
// var greeting = 'Hello';

// Localized string
var greeting = i18n.__('Hello');

The exception is when responding to http requests, in which case you can attach the reference to the request object. For example, let’s create a simple web server that displays “Hello” to the user:

app = http.createServer(function(request, response) {
	i18n.init(request, response);
	response.end(response.__('Hello');
});

app.listen(3000, '127.0.0.1');

i18n automatically generates the default locale using the keyword provided. When the app starts, i18n creates three new files:

  • en.json, which contains the default keys
  • es.json, which will contain the Spanish translations
  • de.json, which will contain the German translations

Further down, we’ll show you how to connect your app to Transifex using the Transifex client.

RequireJS

RequireJS provides its own i18n implementation that can also be imported into Transifex. If your Node.js app uses RequireJS, add the i18n.js plugin to your project.

Similar to i18n-node, RequireJS uses JSON to store translations. However, rather than using a standard JSON Key-Value file format similar to i18n-node, RequireJS actually defines the key-value set as an object.

RequireJS searches for localization files in the “nls” folder under the project directory. Additional languages are stored in separate directories and are enabled by adding the language to the main file.

For more details on localizing with RequireJS, see the RequireJS documentation on i18n.

Sending the Source File to Transifex

The Transifex Client makes it easy to move translations between your app and Transifex. If you don’t already have the Transifex Client installed, follow the instructions here.

Start by creating a new Transifex project using the JSON Key-Value (.json) file format. The en.json file will act as our source file. Once you’ve created the project, use the Transifex Client to create a local repository in your app directory:

$ cd myapp
$ tx init

tx init creates a .tx directory in the current folder with a basic config file. We’ll set up the configuration by specifying the directory where localized files are stored, setting en.json as the initial localization resource, setting English as the source language, and specifying standard JSON as the file type.

Transifex_project_slug specifies the URL slug for your project, while default_resource_slug specifies the URL slug for the default resource. You can find both of these in your project page.

[main]
host = https://www.transifex.com

[Transifex_project_slug.default_resource_slug]
file_filter = locales/.json
source_file = locales/en.json
source_lang = en
type = KEYVALUEJSON

Finally, push any local changes to Transifex using tx push -s. The -s flag tells the Transifex Client to update the project with the source file.

$ tx push -s
Pushing translations for resource nodejs-test.enjson:
Pushing source file (locales/en.json)
Done.

The project page should reflect the updated source file:

Transifex project page

Localizing in Transifex

Now that your source file has been pushed to your project, you can use Transifex to start localizing. We’ll show you how to add new languages, update translations, then push the changes back to your Node.js application.

Adding Existing Languages

If your app already contains translations, you can push those translations to Transifex using the Transifex Client. Use tx push with the -t flag instead of the -s flag to push any existing translations to your project.

$ tx push -t
Pushing 'de' translations (file: locales/de.json)
Pushing 'es' translations (file: locales/es.json)
Done.

For smaller projects, it may be easier to create a new translation and enter any existing translations as comments. For more information on how you can do this, see the Transifex Editor tutorial.

Adding New Languages

Begin by clicking “Edit Languages” in the Project Languages table:

Project languages

In the popup window, enter the languages that you want to add to your project. In this case, we’ll add Spanish (es) and German (de). As you type, the search box will filter based on the name of the language. You can also specify locales such as Latin American Spanish (es_419) or Austrian German (de_AT). When you’re ready, click Apply.

Add languages

Adding Translations

Your new languages will appear in your project dashboard. Since no work has been done on them, they’ll both appear at 0% completion. You’ll also see that no translators have been assigned to the new languages. You can learn more about adding translators through the People page.

Adding translations

Let’s start by filling in our German translation. Clicking on the German language will bring you to the resource page. The en.json source file is shown along with its category and completion. From here, you can also assign and manage the project’s team members.

Transifex resources

Click on the en.json resource, then click Translate in the popup window:

Translation popup

This will bring you to the Transifex Editor, where you can modify and review your translations. The left-hand panel shows the strings provided by your resource file. In this case, we only have the one string, “Hello”. The right-hand panel provides tools for entering and reviewing translations. Additionally, you can view suggested translations, read through a history of translations, add comments or instructions, and more. You can learn more about the Transifex Editor through the Editor tutorial.

When you’re ready to translate, select “Hello” so that it appears in the right-hand panel under “Untranslated String”. Type “Hallo” in the translation box underneath the string and press the Tab key or the Save button. The translation will be marked as unreviewed until it can be verified by a reviewer or manager. In the meantime, do the same for the Spanish translation by switching to the Spanish resource and entering “Hola” in the translation box.

Transifex translation editor

The project dashboard will now show the languages as pending review. Next, we’ll sync the new translations back to Node.js.

Syncing Changes Back to Node.js

Using the Transifex Client, you can pull changes back to your Node.js app with tx pull:

$ tx pull -a
New translations found for the following languages: de, es
Pulling new translations for resource nodejs-test.enjson (source: locales/en.json)
 -> de: locales/de.json
 -> es: locales/es.json
Done.

You can verify the results by checking your es.json and de.json files. If the process was successful, you should see a new key-value pair for each language.

$ cat locales/es.json locales/de.json
{
	"Hello": "Hola"
}{
	"Hello": "Hallo"
}

With that, your Node.js app is ready to go! If you’re using i18n-node, you can test your configuration using i18n’s the setLocale() function:

i18n.configure({...});
i18n.setLocale('es');
app = http.createServer(...);

Retrieving Project Information through Node.js

There are other ways to interact with your Transifex project through Node.js. node-transifex is a community module that provides an easy interface to the Transifex API. After adding your project name and user credentials to node-transifex, you can use the module’s built-in methods to pull data about your Transifex project. For instance, as a project owner, you can use the languageSetMethod() function to return a list of the project’s languages along with the coordinators, translators, and reviewers for each language.

node-transifex also allows statistics gathering, such as the percentage of items that have been translated and the percentage of items that have been reviewed. You can gather statistics for the entire project, or for a specific language.

Install node-transifex using npm:

$ npm install transifex

Then, initialize the module with your project name and credentials:

var Transifex = require("transifex");

var transifex = new Transifex({
    project_slug: "myProject",
    credential: "username:password"
});

For a list of available functions, see the node-transifex project page.

Coming Soon: A New, Simplified URL Structure for Transifex

UPDATED 3 August 2015: The new URL structure is now live.

Early next week, we’ll be releasing a new URL structure for Transifex. This update will make navigating around easier, and you’ll be able to quickly tell which Organization or Project you’re in. There’s nothing you need to do for now, but we recommend that you read on for the details.

The current URL structure

Let’s say we have an Organization named “We love our Users”. It has a project called “Transifex Rocks” and a resource named “My Resource”. Here’s how you’d navigate to each page with the current URL structure:

Organization Dashboard:     www.transifex.com/organization/we-love-our-users
Project page:                        www.transifex.com/projects/p/transifex-rocks
Resource page:                    www.transifex.com/projects/p/transifex-rocks/my-resource

As you can see, it’s not exactly easy to tell which Organization a Project belongs to just by looking at the URL. So we simplified things!

The new URL structure

With the new, simplified URL structure, here’s how things will look:

Organization Dashboard:      www.transifex.com/we-love-our-users
Project page:                         www.transifex.com/we-love-our-users/transifex-rocks
Resource page:                     www.transifex.com/we-love-our-users/transifex-rocks/my-resource

Slowly, we see a pattern take shape:

www.transifex.com/Organization-Slug/Project-Slug/Resource-Slug

All URLs in Transifex will follow the paradigm above. It’ll make it easier for you to understand which Organization, Project and/or Resource page you’re visiting at any given time.

(Slugs are the unique names we use within Transifex to identify individual entities such as Organizations, Projects and Resources.)

What about the API?

None of the changes will affect the API, so there’s nothing you need to do or change.

What about existing bookmarks and links?

Existing URLs will be redirected to the new URLs through the 31st of January 2016.

In order to keep our code base clean and not have it slow down our development speed, we won’t be supporting the redirects beyond that point. But until then, your bookmarks and other links are safe, and there won’t be any broken links.

We still recommend you update the links in your bookmark manager and those pointing from your website to your Transifex project page as soon as possible. This gets rid of the redirect, making pages load faster each time you or someone else clicks on a link to Transifex.

Thank you for taking the time to read through everything. Have a great day!

Elasticsearch at Transifex

We recently announced Translation Memory 3.0, which, at its core, uses Elasticsearch.

This blog post will highlight particular areas of interest on what we learnt using Elasticsearch in production. Enjoy!

Resilience

Elasticsearch isn’t our source-of-truth and, at least for now, that’s a good thing! During the lifetime of building Translation Memory 3.0, we’d seen many patches, videos, and interviews speaking about Elasticsearch’s ability to deal with failure. As it stands, Elasticsearch, especially in a scalable-production environment, needs a little bit of support.

Elasticsearch has introduced some really cool roles for nodes within a cluster. A node can be a Master, Worker, Data or Tribe node. Roles define a node’s responsibilities within a cluster. For example, a Worker node is responsible for receiving queries, aggregating the results, and returning them to the caller.

Master nodes are the most important when it comes to resilience. These nodes are solely responsible for managing cluster-level operations and master eligibility. To prevent a split-brain, the cluster must be able to ‘see’ a quorum of Master-eligible nodes. Therefore, nodes within a cluster containing 3 Masters (the minimum number for a quorum), can only receive requests if they are able to communicate with 2 Master-eligible nodes. This can be seen as a pessimistic approach towards the split-brain problem, but it ensures your data stays consistent across the entire cluster and nodes don’t start their own little club.

To set a node as a master, simply set these variables in your elasticsearch.yml configuration file:

node.data: false
node.master: true

Shards

We use the routing parameter with all of our queries. When provided, it allows Elasticsearch to route both index and queries to a single shard, ensuring you’re not sending requests to the entire cluster. I strongly suggest you look at your data and look for appropriate routing variables (e.g. user ID, organisation ID, etc.) This section assumes you’ve done the same.

If you’ve just started with Elasticsearch, then your first question is probably going to be “How many shards?”

This can’t be answered by anyone else and it requires a good deal of testing and a little bit of foresight. Each shard in Elasticsearch is a fully-fledged Lucene index. Too few or too many and you’re going to bring your cluster to a crawl. The general idea is to fire up a single-server cluster (try to use the same hardware you’re going to be using in production), add some real-world data and then start running the same kinds of queries you’ll run in production until the queries become too slow for your use-case.

At this point, you can simply divide your total amount of data by the amount in your test cluster, giving you a rough number.

However, this method will only give you the number of shards you need at that moment. If you expect your dataset to grow, then you’re also going to have to consider that rate and adjust accordingly. But ultimately, and this is something we accepted, if Elasticsearch isn’t your source-of-truth, then you have the freedom to make mistakes. If you’re of the same opinion, then make sure you use aliases with your indexes. These will ensure you can build another index in parallel and then switch your queries over once it’s complete.

Know your data

It’s probably obvious, but the best thing you can do is to know your data and the queries you’ll be running over it. Elasticsearch will happily begin to accept documents from the very start without you configuring a single thing. Don’t do this. Make sure you know what it means to create an index, a document type, and the implications these decisions have on the final product.

The mapping of your documents is incredibly important, and I can’t stress enough how vital it is that you know what it means for a field to be stored, indexed and the various analysis options provided. The Elasticsearch documentation is the best place to find out about these kinds of things.

Indexing

This is where we spent the most of our energy. When we started working on Translation Memory 3.0, we nailed down some hard-requirements. The most critical of these was the system’s ability to work in real-time.

Elasticsearch Translation Memory

We looked at a bunch of different options but couldn’t find anything to suit our needs. We use Postgres and we wanted something which would work well with it. Around this time, we found out that the Elasticsearch River API was being deprecated and the elasticsearch-jdbc plugin didn’t suit our needs.

So, we made our own. We created a library called Hermes to asynchronously receive and process event notifications emitted by Postgres’ NOTIFY.

We’re going to be talking about the things we’ve created with Hermes in a later blog post. If you want to learn more then head over to the repository and take a look through the docs.

Filtering

Designing your queries can be tricky and great care must be taken when deciding the best approach. One thing to note when querying, however, are filters. Filters provide a way to reduce the dataset that you ultimately need to query. In addition, they can be cached for even better performance!

In our case, determining the similarity between two strings in Transifex is done using the Levenshtein distance algorithm. This algorithm, especially at scale, can be very costly. So, we save the length of each string in our index along with a bunch of other metadata. When we need to look for similar strings, we calculate the length boundaries and then filter out anything which doesn’t meet the criteria – greatly reducing the strings we actually have to perform ‘work’ over.

Monitoring

Marvel. I can’t recommend it enough. We used a mix of different graphing and monitoring tools but none came close to the clarity and convenience of Marvel. We used it during development and saw the value in it straight away. If your cluster is important to you and its continued running is critical to your business, then do yourself a favour and try it out!

Have you used Elasticsearch in production? What were the challenges you faced? Let us know in the comments below.

4 Critical Elements of Website Localization

Companies looking to go global must tailor their approach to each target locale to create a successful customer experience. Let’s dive into 4 elements to keep in mind when localizing your website.

1. Branding

What’s one of the first things your customer, or potential customer, will see upon entering your site? Your logo and tagline. You would like to think that your brand and messaging would be able to move seamlessly across borders and be understood by everyone. But that’s not always the case. When you’re globalizing a brand, it’s always a good idea to double check and make sure your tagline or logo won’t translate into something offensive or be misunderstood.

Consider these less than stellar examples of branding gone awry::

  • Clairol launched a curling iron called “Mist Stick” in Germany, even though “Mist” is German slang for manure.
  • Mercedes-Benz entered the Chinese market under the brand name “Bensi,” which means “rush to die” – probably not something you want to be associated with as an automotive company.
  • The American Dairy Association replicated it’s “Got Milk?” campaign in Spanish speaking countries where it was translated into “Are You Lactating?”

When translating your messaging, logo, tagline, etc., you should be aware of the pronunciation, spelling and double check the the actual meaning is culturally appropriate.

2. Space

The number of characters per word differs language by language. German uses 11.66 characters on average per word, while Swedish averages 8.51 per word, and Croatian averages 7.06.

Word count average by language
Image source.

You might be asking, so what? Can a few character difference really change your web design layout? Short answer – yes, it can. When you are dealing with multiple languages, the layout of text displayed on your website will need some tweaking between languages. If your button allows for space for 6 characters, and then another language requires a 10 character space, something will have to give.

One way to tweak your text is to play with font size. Look at Amazon’s homepage in Japanese and German. German tends to use more characters per-word, so Amazon used a smaller font, while the Japanese text is laid out in a larger font, both use almost the same amount of horizontal space, but the vertical space varies.

Amazon's homepage in Japanese and German
Japanese (left), German (right)

3. Culture-specific design

If your website uses graphics, imagery, photos, etc, let’s assume you had it created for your initial audience. Now if you are going global you should make sure that the imagery you have used in your source language website will work in your new locales. If it doesn’t, you need to find replacements that will fit. If you adapt your images to a specific locale you will create a better relationship with your users. If your images are unrelatable, you could ruffle some feathers.

For instance, you wouldn’t want to display hamburgers or beef steaks if India is your target zone. Nor would you want to you want to show someone with muddy boots indoors if you are targeting your website towards a Japanese audience. You should also be aware that gestures don’t have the same meaning globally. For instance, if your image has a person doing a “V” sign facing inward, you might think it means “Peace, man” but if you’re in the UK, it’s the equivalent of giving someone the middle finger in the US. And the “thumbs up” gesture as a way of saying “good” in one culture can become an entirely different, and unpleasant, phrase in another culture.

Dang it Bieber.
Dang it Bieber.

Color associations aren’t universal either. In North America, green often is associated with entities such as nature or wealth, while in some asian cultures, green is often tied to sickness.

4. Simple (sometimes)

Sometimes the key to good design is built upon simple forms, lines, text and composition; other times good design is structured around complex forms and deliveries of information. A well-localized site means staying in tuned with local design norms.

Most Americans prefer simple relatively designs; Google performs very well in the US. Naver, on the other had, a very successful search engine in South Korea that outperforms Google in Korea (leaving Google with a market share of 36.9%). Why is that? One reason is due to Naver’s understanding of local design norms.

Naver vs. Google
Naver (left), Google (right)

Google’s homepage is a stark contrast to Naver’s. Naver’s homepage is much busier, using lots of images, sections and banners. Unlike Google’s page layout with a single search box, Naver displays provides insights and trends into what people are searching for as well as the most frequently searched keywords. It’s tailored for a South Korean audience and reflects an understanding of what its users like and want.

Want another example of culturally adjusted web design? Take a peek at Rakuten’s – a Japanese e-commerce company – website.

Rkuten's homepage in Japanese, Austrian, and American-English
Japanese (left), Austrian (center), American (right)

Not only has Rakuten localized their site into 13 different languages, but they also have a different design that works for the target culture. Rakuten Japan uses several colors and displays a lot of information above the page fold, while Rakuten USA uses more white space.

Research has repeatedly found that people’s visual preferences widely vary. Some countries such as China, Singapore, and Malaysia preferer more colorfulness, while countries such as Denmark, Switzerland, and Sweden prefer less colorfulness. What this means is if your website is consistent with cultural design norms, it will look familiar and create a better user experience.

Hammering it home

The one-size-fits-all approach doesn’t apply when it comes to localization. You might be asking yourself why can’t you just make one site and google translate its text? Why is good website usability so important? Because a good customer experience is what will bring your users back; localization is an investment for a better future.

Why Is Localization So Dang Hard?

This post initially appeared on Medium.

Why Is Localization So Dang Hard

With so many software frameworks and development environments supporting the ability to internationalize software, why is localization still so difficult?

I18n, L10n, translation — what does it all mean?

Technically, the most direct way to build a global application is to 1) somehow figure out which locale the user wants and then 2) give the user an user interface specific to the locale requested. Prior to the Internet, often software developers would build separate applications for every locale. Localization files had to be distributed with the application (usually on separate floppy disks, yes, we are going back that far) and the user had to pick the right floppy disk per language; this process was fairly awful. With the advent of the Internet and the proliferation of computer access globally, it has become common and easier to support multiple languages in the same app.

The problem that arose from relying on the user to pick application versions based on language was partially solved by operating systems. The OS software developers built in the capability for the user to pick their locale during configuration. This advance limited the exposure to most users whose operating system was set up for them. While this has been amazing progress for the user, for the software developer who is building the user interface, these changes did not go far enough.

Standards? Where did they fall short?

When learning about globalization, you can find a plethora of documentation on globalization standards. However, when it comes to actually implementing translations in a product or website, there is little guidance. The good news here is that the mechanism to display different languages in a software product, or what is commonly referred to as internationalisation (i18n), is a well understood software engineering problem. As a result, most development environments or web frameworks support i18n. Unfortunately, there is a downside.

Software developers tend to be a fairly disagreeable bunch, and so they disagreed about the *right* way to support i18n. It is in this lack of “universality” where the standards fall short. Each programming language implements a slightly different form of i18n with a slightly different approach. Some languages avoid this altogether and leave it up to frameworks and libraries to solve.

File formats had to become the standard

In the absence of clear guidelines, the software development community has had to find a way manage translation assets. For this reason, they turned to file formats to specify the integration method. In some cases, the programming language simply adopted a well known file format as a “method” of integration. Oh, you are using a PHP framework? Well, then you must be using PO files for managing your translations. However, there are a couple of key issues with a file based approach.

  1. Version management is a nightmare. Developers often make multiple copies of translation files when building applications. This can lead to significant confusion around which set of files are the most current from the translators. Or even worse… software development projects sometimes have last minute text changes. Those changes often result in generating even more translations files.
  2. Process agility is sacrificed. In a file-based approach, the translation file needs to be completed with all translations and generally blocks the development process. On large software projects, having external blocks waiting for translators to complete can often slow even the nimblest development teams. Evidence for this can be seen in the fact that many software startups bypass any localization efforts completely in an effort to keep their development velocity high.
  3. We forgot DRY! Often with the file-based approach translation management tends to organize the translation files around a particular project, or product, or website. After a few iterations, translators are translating the exact same text copy again and again. If there is no process in place to limit this effect it can spiral out of control in time and cost just the same way that real code does when we neglect DRY principles.

Looking for a better way

Looking for a better way

It was in this environment that Dimitris Glezos found himself when working with the Fedora Linux project in 2007. Back then, translation projects had grown so large and unmanageable that Red Hat developers were desperate for help. Dimitris came up with the idea for Transifex.

“The idea is that Transifex will act as a proxy/mediator for translation commits.”

Fast forward to 2015, Transifex is part of the cloud technologies landscape but has this completely solved the problem? We’ve made great progress, but there is always more to be done.

This approach does gain some ground on versioning and agility. However, we have also added some new issues. Clearly, using the cloud to manage our translation files is just one step in solving this problem. Dimitris idea of needing a proxy/mediator between translators and developers still persists even today. Transifex’s developer-centric approach aims to ease management, storage, collaboration, and real-time monitoring that will allow companies to launch products and content in multiple languages without slowing down the development speed; thus, solving these translation issues.

Taking a leap forward

Taking a leap forward

Part of the problem with globalization is that, generally speaking, we’ve been going about it the wrong way. We’ve been focusing on translation management as an engineering problem and have been building developer first focused solutions. But in order to take a leap forward, we need to solve the potentially harder, more people focused issue and make translation efforts truly seamless for the individual. Here are three key aspects of doing this:

  1. Using software tools as an enabler. Software tools should enable us to build on a global level — they shouldn’t be used to define boundaries. There will always be cases in whatever approach we take where issues arise. Our tools should be capable of helping guide us past those issues and smart enough to such that they don’t come up again once they are solved.
  2. Appropriate context for everyone based on their role. Here are some examples. People who are performing the translator role need to see the text copy in context of the website or application and not in some localization file format. Translation project managers need to see a dashboard of timelines and cost so they can have the appropriate context for their role. And finally, developers should not need to spend time digging through UI for translatable strings…this should happen seamlessly as part of their build process.
  3. Keeping translations cycles quick. Agile methods have transformed our approach to developing software. No longer do we spend months in dingy poorly lit rooms building an application before validating it with product experts or even users. Translation projects can benefit the same way. By allowing for shorter cycles, and more transparency not only will timelines be reduced, but overall quality will likely improve as well. This approach enables us to fit to a process rather than force fitting the process to us.

The world is just a big community

With the growth of the Internet, especially in countries outside of the US and Europe, we are quickly finding that the world as a whole is a community unto itself. Even though it is not practical for the entire world to agree on a single language, it *is* practical to expect our software and processes to make this as transparent as possible. When we are building new software products or websites we shouldn’t even need to *choose* for whom to make our product available. It should simply be available to all.

Find out additional information

The future is not as far as you might think!

  1. See how Transifex Cloud APIs helps streamline file-based translation approaches https://www.transifex.com/product/
  2. See how Transifex Live is helping to create translation context for websites http://docs.transifex.com/live/quick-start-guide
  3. See how Transifex Command Line Client is automating software build processes http://docs.transifex.com/integrations/best-practices
  4. See Dimitris’ letter to the Fedora community https://lists.fedoraproject.org/pipermail/i18n/2007-July/000671.html