Mark McKillop and Alberto MediaVille discuss how Meta scaled its backbone network to meet increasing demands, particularly from AI workloads. They cover the evolution of Meta's backbone architecture, highlighting the differences between CBB and EBB networks, and focusing on the scalability challenges of EBB. Alberto details techniques used to achieve a 10x increase in backbone capacity, including pre-building DC metro architecture components, platform scaling (scaling up and out), and IP and optical integration using CR technology. Mark then discusses the challenges and solutions for building larger GPU clusters, including the need for high fiber counts and the deployment of sophisticated optical technology. They conclude with key learnings, emphasizing the importance of innovation, pre-building scalable designs, IP and optical integration, and reusing existing technologies, and preview future plans involving leaf and spine architecture and continued AI backbone development.
Sign in to continue reading, translating and more.
Continue