How Transifex Ensures High Availability
Our ongoing commitment to delivering exceptional customer experience is also related to our application’s uptime and performance. This article will explore how Transifex ensures high availability for our critical components.
What is High Availability
High availability means that the infrastructure operates at a high level, continuously, without intervention for a designated period. A high-availability system, component, or application is configured to work with minimal, close to zero downtime.
What is the Difference Between 99.9% and 99.99% uptime
To put it simply:
99.9% uptime means 8 hours, 45 minutes, and 56 seconds of allowed downtime per year, whereas 99.99% uptime means 52 minutes and 35 seconds of allowed downtime per year. The difference is almost 8 hours of downtime per year.
Our highly available cloud architecture and unwavering focus on reliability, have already resulted in a greater than 99.99% uptime since early 2022.
Uptime during the last 6 months: TMS: 99.87% – APIv3: 99.88% – CDS: 99.99%
Uptime during the last month: TMS: 99.99% – APIv3: 100% – CDS: %100%
You can always visit https://status.transifex.com This page reflects the status of our services and documents any interruptions.
How Transifex is Achieving High Availability and Zero Downtime
High availability is of great importance for mission-critical systems. As a complex application, Transifex has various components. Our DevOps and Engineering teams are 24/7 monitoring our uptime using pingdom.com. Our most critical components include Core TMS, APIv3, and our delivery networks: Transifex Live & Native CDS.
Transifex Content Delivery Service (CDS) is a standalone, lightweight service responsible for the super fast, over-the-air delivery of the translated content from Transifex to the application and, eventually, the end-user.
On top of this, we are using Amazon Route53 for setting some advanced routing options. More specifically, leveraging Route53, we ensure that all requests to CDS are routed to the appropriate region based on availability. In a worst-case scenario, when a region is down, its status is set to “unhealthy.” All requests are routed to the other regions until the issue is resolved.