Kubernetes at LinkedIn, with Ahmet Alp Balkan and Ronak Nathani

This episode explores the challenges and strategies of running Kubernetes at scale, specifically focusing on LinkedIn's experience. Against the backdrop of LinkedIn's transition from a proprietary containerization system to Kubernetes, the discussion highlights the complexities of managing stateful workloads, including databases, on a bare-metal infrastructure. More significantly, the engineers detail their custom-built solutions, such as a generic stateful workload operator and an application control manager, which allow for granular control over maintenance and updates, even for stateful systems. For instance, the application control manager allows for communication with the database to coordinate maintenance without disrupting operations. As the discussion pivoted to infrastructure management, the team described their use of Kubernetes as an orchestration layer for their bare-metal servers, enabling programmatic management of their data center inventory. In contrast to using standard Kubernetes components like KubeDNS, LinkedIn leverages a custom networking stack for optimal performance. What this means for other organizations is that managing Kubernetes at extreme scale requires a deep understanding of the system and a willingness to build custom solutions to address specific needs and limitations of off-the-shelf components.

Outlines

Sign in to continue reading, translating and more.

Continue

Kubernetes Podcast from Google

Podcast Introduction and Kubernetes News

LinkedIn's Kubernetes Journey and Stateful Systems

Custom Stateful Workload Operator and Application Control Manager (ACM)

etcd Management and Alternatives to etcd

Bare Metal Infrastructure Management

Cluster Size and Hardware Upgrades

Ensuring Predictable Performance Across Different Hardware

Custom Controllers and Development Pitfalls

Evaluating Open Source Components

Platform Engineering at LinkedIn and Developer Experience

Balancing Abstraction and Kubernetes Awareness, Incident Discussion, and Conclusion

Kubernetes at LinkedIn, with Ahmet Alp Balkan and Ronak Nathani

Kubernetes Podcast from Google

00:00Podcast Introduction and Kubernetes News

Podcast Introduction and Kubernetes News

01:57LinkedIn's Kubernetes Journey and Stateful Systems

LinkedIn's Kubernetes Journey and Stateful Systems

04:53Custom Stateful Workload Operator and Application Control Manager (ACM)

Custom Stateful Workload Operator and Application Control Manager (ACM)

08:26etcd Management and Alternatives to etcd

etcd Management and Alternatives to etcd

12:45Bare Metal Infrastructure Management

Bare Metal Infrastructure Management

14:47Cluster Size and Hardware Upgrades

Cluster Size and Hardware Upgrades

17:03Ensuring Predictable Performance Across Different Hardware

Ensuring Predictable Performance Across Different Hardware

21:50Custom Controllers and Development Pitfalls

Custom Controllers and Development Pitfalls

26:21Evaluating Open Source Components

Evaluating Open Source Components

30:35Platform Engineering at LinkedIn and Developer Experience

Platform Engineering at LinkedIn and Developer Experience

34:56Balancing Abstraction and Kubernetes Awareness, Incident Discussion, and Conclusion

Balancing Abstraction and Kubernetes Awareness, Incident Discussion, and Conclusion