Booking.com, Spotify, Pinterest: How Kubernetes solved big problems for big companies
Another week, another geeky update! As I continue to share my knowledge and experience from the world of data engineering, I bring to you an interesting insight into Kubernetes. If you have been following my blog, in my last tutorial, I elaborated upon what makes Kubernetes the ultimate container orchestration system and is the platform of choice for scaling and deployment. In today’s article, I will touch upon some important case studies of major companies using Kubernetes to solve their container orchestration problems.
One of the biggest music streaming platforms with 130 million premium subscribers, Spotify is an early adopter of microservices and Docker. However, it was by late 2017, when the platform decided to move from the homegrown Helios to the feature-rich Kubernetes. But why did Spotify decide to take the big leap?
“We wanted to benefit from added velocity and reduced cost, and also align with the rest of the industry on best practices and tools”
- Jai Chakrabarti, Director of Engineering, Infrastructure and Operations.
In 2018, the team addressed technology issues required for the migration from the existing content orchestration platform to Kubernetes. The team used a lot of Kubernetes APIs and extensibility features to support and interface with their legacy infrastructure for an easy integration. Spotify began the migration journey to Kubernetes later that year with a small percentage of its fleet, containing over 150 services.
However, Spotify’s Kubernetes migration was not a smooth one as the team ended up accidentally deleting all its Kube clusters not once, but twice, all with little to no user impact. Since the incident, the team learned to operate many clusters automatically and safely. The team also cut down on its downtime and human error by declaratively defining clusters in code with Terraform, backing up and restoring clusters with Ark, and increasing scalability and availability by running many more clusters.
Now, many of you may wonder how Kubernetes exactly helped Spotify? Well, the time consumed by the team to create a new service and get an operational host to run it in production reduced to a matter of seconds and minutes with Kubernetes. In addition to the ease of scale and time efficiency, migrating to Kubernetes also helped improve CPU utilisation up to threefold. The biggest service running on the platform is capable of taking over 10 million requests per second. In the early days of Kubernetes, Spotify’s team also built a tool called Slingshot that creates a temporary staging environment that auto destructs after 24 hours.
Booking.com adopted Kubernetes to achieve ‘sustainable scalability.’ In 2016, the online travel agency migrated to an OpenShift platform to give product developers faster access to infrastructure. A year later, when OpenShift built its own vanilla Kubernetes platform, Booking.com decided to adopt the new container orchestration system.
The association of Booking.com and Kubernetes dates back to 2015 when a team at the travel platform prototyped a container platform based on Mesos and Marathon. However, in order to cater to their need for enterprise features at its scale, the team adopted the OpenShift platform. Even as the platform offered high-level CLI interface, developers faced a ‘knowledge bottleneck’ that stopped them from being able to support themselves as most of them did not know it was Kubernetes beneath.
The team decided on a new solution of building a vanilla Kubernetes platform of their own and customise it. In doing so, their existing experience of working on OpenShift proved helpful.
“We have a tutorial. You follow the tutorial. Your code is running. Then, it’s business-logic time. The time to gain access to resources is decreased enormously.”
- Ben Tyler, Principal Developer, B Platform Track at Booking.com
So what changed with Kubernetes for Booking.com. Earlier, creating a new service could take a couple of days or weeks, depending on whether the developers understood Puppet. On the new platform, it can take as few as 10 minutes. The team was able to build almost 500 new services on the platform in the first 8 months of adoption with hundreds of releases per day.
Helping Booking.com with its agenda of sustainable scaling are other CNCF (Cloud Native Computing Foundation) technologies including Envoy, Helm, and Prometheus. The team also developed Shipper which is an extension for Kubernetes to add more complex rollout strategies and multi-cluster orchestration.
Another interesting case study of Kubernetes solving big brand problems is that of Pinterest. In 2016, the social media service decided to move to a new compute platform that could both be agile and seamless. It started by moving the services to Docker containers and when these services went into production, the team began looking at orchestration and Kubernetes became the obvious choice.
How did Kubernetes provide Pinterest the ease of scale and the benefit of simple deployment? Being a service with more than 200 million active monthly users and a thousand microservices running under the hood, the pain point for Pinterest was lack of velocity in taking an idea to production due to an inconsistent and complex end-to-end developer experience. With Kubernetes, the team at Pinterest was able to build on-demand scaling, new failover policies, and simplify overall deployment and management of Jenkins.
“We not only saw reduced build times but also huge efficiency wins. For instance, the team reclaimed over 80 percent of capacity during non-peak hours. As a result, the Jenkins Kubernetes cluster now uses 30 percent less instance-hours per-day when compared to the previous static cluster.”
-Micheal Benedict, Product Manager for the Cloud and the Data Infrastructure Group at Pinterest.
It was in July 2017 that Pinterest decided to address the issues of running on virtual machines by choosing Kubernetes over other orchestration platforms. The team began onboarding its first use case of Jenkins workloads into the Kubernetes system in the beginning of 2018. By the end of Q1, the team successfully migrated Jenkins Master to run on Kubernetes.
Spotify, Booking.com, and Pinterest are only some of the many key users on the global scale running Kubernetes in production. Given the ease of access, customizability, and scalability, Kubernetes is poised to be the new cloud platform.