Autoscaling in Transifex
Over the last 18 months, things have been moving very fast for the engineering and devops team at Transifex. We have adopted docker, moved away from our monolith towards microservices, and introduced Kubernetes. Though complexity hasn’t been trivial and the team had to become comfortable with many new technologies, performance and flexibility have provided a great benefit towards the way we develop and release new products and features.
Our goal on the devops team is to provide our engineers with the tooling needed to deliver stable software. This, and keep the platform up and running of course! By ensuring continuous feedback between the teams to spot gaps, issues, and inefficiencies, the team is adopting and introducing new tooling at a constant rate. One of our latest additions is horizontal pod autoscaling (HPA), which is responsible for scaling up or down our pods, based on certain external metrics.
Why scale up or down, you ask? Well, our resources are not unlimited, and no one likes to sit on 50 pods of a service that normally needs only 10, except from that 1 time which spikes and needs additional traffic. Scaling up and down our pods will give us the observability we need along with a good boost in resources for the rest of the services that we wanted to expand but were too expensive to.
First things first, a few words about scaling. No one is better than the developers who wrote the service on knowing when and how it should scale. Ideally, engineers should have the tools (both technical and monitoring ones) and the knowledge to set up the scale of their application. In the past, that would be very difficult for the average engineer in Transifex. People would have to know Ansible, be familiar with our infrastructure, and understand how to test it and deploy the application. With our modern setup, you only need to be familiar with Helm. Before we see our toolset, let’s get a bit more technical.
Out of the box stuff
Luckily, Kubernetes supports pod autoscaling almost without any hustle. You only need to install the ’metrics-server’, an efficient source of container resources, which exposes CPU and Memory metrics for your cluster pods. The metrics server is lightweight, scalable, and extremely easily to set it up. Out of the box, it can be accessed by ‘kubectl top’, making it easy to debug your pods, but combined with HPA can scale up or down your pipelines based on those metrics. But it can’t be that simple, can it?
Unfortunately, most times it is difficult to keep up with your service SLAs if your application is scaling only based on CPU and Memory. For this reason, we need to be able to scale based on other, internal or external metrics, so it gets just a bit more complicated than that. But first things first, let’s see our APIs.
An introduction to the metrics APIs in Kubernetes
The API is the heart of Kubernetes. Its server exposes an HTTP API that lets everyone (users, internal and external resources) communicate with each other. By default, it exposes info from the metrics server that we have mentioned previously.
The API however doesn’t stop there. Its standard interface supports two additional endpoints, one for custom metrics and one for external. Custom metrics are associated with a Kubernetes object, while the external ones are, well, associated with an external service. As an example, if your service exposes data to Prometheus, you can use those metrics to scale those pods. Likewise, if you have data in an external source (like an SQS service or a RabbitMQ cluster) you can scale up or down based on those.
In our case, our setup in question includes Django, Celery and RabbitMQ. RabbitMQ already exposed its info to a Prometheus server for monitoring and alerting through Grafana. Our requirement was to scale our Django workers (used to consume tasks from RabbitMQ) based on the number of tasks each queue had.
Introducing Prometheus adapter
The issue with custom and external metrics is that they increase complexity a little bit more. In order to implement them, you need:
- A way to collect your desired metrics from various sources. Prometheus is our obvious choice here but there are few other alternatives (ex. Google’s Stackdriver).
- A metric API server, responsible for converting data from your metric collector mentioned above to the external metrics API we discussed a while back. A popular choice here is Prometheus adapter, which provides an implementation of the custom-external metrics API and supports arbitrary metrics.
Putting everything together, our solution looks something like this:
The Prometheus adapter is the missing link that connects everything together. It is responsible for enabling the internal and external APIs for HPA to consume. But how does the adapter know how to link the data it gets with the Kubernetes resources it needs to scale, when these data are not connected with any particular namespace or pod? Normally, if a pod was exposing prometheus metrics, the exporter would know the connection. But what about those metrics that are coming from an external source, like RabbitMQ in our case?
Thankfully, the Prometheus adapter comes with an extensive configuration, which can discover, associate, name, and query your metrics, so it basically converts your data to a Kubernetes object.
In our case, see below an example for such a configuration rule:
The very first line discovers the metrics available (discovery). Then, we tell the adapter which Kubernetes resources each metric is associated with (association). In our case, <<.Resource>> points to the resource provided through the API call, since RabbitMQ doesn’t hold any information about pods or namespaces. Then, we name our query something unique to distinguish it from the rest (naming), and finally we modify it to match Prometheus functional language (querying). The <<.Series>> key is referring to the seriesQuery in the first line. After all this effort, we will end up with the following Prometheus query:
To see it running in action, just try out the external metrics API:
The query involves a random namespace in a random pod, but it doesn’t matter for now. The interesting data here is value “7”, it means that there are 7 tasks currently in that queue. HPA itself will use this exact API call to get its data, so it’s good to know that it’s working.
Putting everything together, with a sparkle of Kubernetes magic, we have our HPA up and running.
Talking about HPA, it couldn’t be any simpler:
A few interesting things:
- Type is external, since our RabbitMQ cluster lives outside our cluster.
- Metric’s name is the one we have created previously. Since each queue will have its own HPA object, names should be different.
- Target value is referring to the number after which a new pod will be created. In our case it means that for every 100 tasks we will scale up or down our pods.
- We are careful with our resources, so we set a comfortable min and max for our replicas.
- What is maybe most important, is the scale target. This refers to the deployment with the given name. This is our <<.Resource>> in the adapter configuration.
We have discussed a lot of different components and concepts, and this is only the tip of the iceberg. You can adjust HPAs to support multiple metrics, or you can (and you should) monitor your HPA objects (we do it in Grafana) in order to hit the sweet spot between minimum and maximum pods.
Let’s step back for a moment and see what we have achieved so far. We have made an important part of our application scalable, based on external metrics. We have freed up resources and given ourselves better observability for our cluster. But, most importantly, we have given a necessary new toolset to the people who know the application best: its developers. This is no small task, and it’s one more step towards the high levels of ownership that we strive for at Transifex.
With our latest Transifex Native solution, you can now manage all your global content in one central place and save time on deployment. To learn how you Transifex Native can help you make localization a seamless part of the development lifecycle, visit www.transifex.com/native.