Why Is Localization So Dang Hard?
This post initially appeared on Medium.
With so many software frameworks and development environments supporting the ability to internationalize software, why is localization still so difficult?
Table of Contents
I18n, L10n, translation — what does it all mean?
Technically, the most direct way to build a global application is to 1) somehow figure out which locale the user wants and then 2) give the user an user interface specific to the locale requested. Prior to the Internet, often software developers would build separate applications for every locale. Localization files had to be distributed with the application (usually on separate floppy disks, yes, we are going back that far) and the user had to pick the right floppy disk per language; this process was fairly awful. With the advent of the Internet and the proliferation of computer access globally, it has become common and easier to support multiple languages in the same app.
The problem that arose from relying on the user to pick application versions based on language was partially solved by operating systems. The OS software developers built in the capability for the user to pick their locale during configuration. This advance limited the exposure to most users whose operating system was set up for them. While this has been amazing progress for the user, for the software developer who is building the user interface, these changes did not go far enough.
Standards? Where did they fall short?
When learning about globalization, you can find a plethora of documentation on globalization standards. However, when it comes to implementing translations for a product or website, there is little guidance. The good news here is that the mechanism to display different languages in a software product, or what is commonly referred to as internationalization (i18n), is a well-understood software engineering problem. As a result, most development environments or web frameworks support i18n. Unfortunately, there is a downside.
Software developers tend to be a fairly disagreeable bunch, and so they disagreed about the *right* way to support i18n. It is in this lack of “universality” where the standards fall short. Each programming language implements a slightly different form of i18n with a slightly different approach. Some languages avoid this altogether and leave it up to frameworks and libraries to solve.
File formats had to become the standard
In the absence of clear guidelines, the software development community has had to find a way to manage translation assets. For this reason, they turned to file formats to specify the integration method. In some cases, the programming language simply adopted a well-known file format as a “method” of integration. Oh, you are using a PHP framework? Well, then you must be using PO files for managing your translations. However, there are a couple of key issues with a file-based approach.
- Version management is a nightmare. Developers often make multiple copies of translation files when building applications. This can lead to significant confusion around which set of files are the most current from the translators. Or even worse… software development projects sometimes have last minute text changes. Those changes often result in generating even more translations files.
- Process agility is sacrificed. In a file-based approach, the translation file needs to be completed with all translations and generally blocks the development process. On large software projects, having external blocks waiting for translators to complete can often slow even the nimblest development teams. Evidence for this can be seen in the fact that many software startups bypass any localization efforts completely in an effort to keep their development velocity high.
- We forgot DRY! Often with the file-based approach translation management tends to organize the translation files around a particular project, or product, or website. After a few iterations, translators are translating the exact same text copy again and again. If there is no process in place to limit this effect it can spiral out of control in time and cost just the same way that real code does when we neglect DRY principles.
Looking for a better way
It was in this environment that Dimitris Glezos found himself when working with the Fedora Linux project in 2007. Back then, translation projects had grown so large and unmanageable that Red Hat developers were desperate for help. Dimitris came up with the idea for Transifex.
“The idea is that Transifex will act as a proxy/mediator for translation commits.”
Fast forward to 2015, Transifex is part of the cloud technologies landscape but has this completely solved the problem? We’ve made great progress, but there is always more to be done.
This approach does gain some ground on versioning and agility. However, we have also added some new issues. Clearly, using the cloud to manage our translation files is just one step in solving this problem. Dimitris idea of needing a proxy/mediator between translators and developers still persists even today. Transifex’s developer-centric approach aims to ease management, storage, collaboration, and real-time monitoring that will allow companies to launch products and content in multiple languages without slowing down the development speed; thus, solving these translation issues.
Taking a leap forward
Part of the problem with globalization is that generally speaking, we’ve been going about it the wrong way. We’ve been focusing on translation management as an engineering problem and have been building developer first focused solutions. But in order to take a leap forward, we need to solve the potentially harder, more people focused issue and make translation efforts truly seamless for the individual. Here are three key aspects of doing this:
- Using software tools as an enabler. Software tools should enable us to build on a global level — they shouldn’t be used to define boundaries. There will always be cases in whatever approach we take where issues arise. Our tools should be capable of helping guide us past those issues and smart enough to such that they don’t come up again once they are solved.
- Appropriate context for everyone based on their role. Here are some examples. People who are performing the translator role need to see the text copy in the context of the website or application and not in some localization file format. Translation project managers need to see a dashboard of timelines and cost so they can have the appropriate context for their role. And finally, developers should not need to spend time digging through UI for translatable strings…this should happen seamlessly as part of their build process.
- Keeping translations cycles quick. Agile methods have transformed our approach to developing software. No longer do we spend months in dingy poorly lit rooms building an application before validating it with product experts or even users. Translation projects can benefit the same way. By allowing for shorter cycles, and more transparency not only will timelines be reduced, but overall quality will likely improve as well. This approach enables us to fit a process rather than force fitting the process to us.
The world is just a big community
With the growth of the Internet, especially in countries outside of the US and Europe, we are quickly finding that the world as a whole is a community unto itself. Even though it is not practical for the entire world to agree on a single language, it *is* practical to expect our software and processes to make this as transparent as possible. When we are building new software products or websites we shouldn’t even need to *choose* for whom to make our product available. It should simply be available to all.
Find out additional information
The future of localization is not as far as you might think.
- See how Transifex Cloud APIs helps streamline file-based translation approaches https://www.transifex.com/how-it-works/
- See how Transifex Live is helping to create translation context for websites https://docs.transifex.com/live/introduction
- See how the Transifex Command Line Client is automating software build processes https://docs.transifex.com/client/introduction
- See Dimitris’ letter to the Fedora community https://lists.fedoraproject.org/pipermail/i18n/2007-July/000671.html