13 Nov 2024
32m

Maglev: load balancing at Google with Cody Smith and Trisha Weir

Podcast cover

Google SRE Prodcast

In this episode of the Prodcast, Cody Smith and Trisha Weir share their journey of rebuilding Google's front-end infrastructure, known as Maglev. Faced with the limitations and high costs of their vendor's network load balancers, they initiated a skunkworks project. By tapping into their internal expertise and fostering a high-trust environment, they quickly developed and launched a new system. The rollout was executed through a gradual migration, focusing on VIPs, which underscored the significance of iterative development, careful monitoring, and adapting to the complexities of the system. Their conversation also highlights the importance of generalist skills, psychological safety, and teamwork within Site Reliability Engineering (SRE) teams.

Outlines

Sign in to continue reading, translating and more.

Open full episode in Podwise