In this interview, Michael Kagan, CTO of Nvidia, discusses the critical role of Mellanox's interconnect technology, acquired by Nvidia in 2019, in driving Nvidia's dominance in AI computing. Kagan explains how the exponential growth in computing needs, especially with the rise of AI, necessitates scaling performance beyond single components, emphasizing the importance of high-speed, high-performance networks for both scaling up within a node (NVLink) and scaling out across multiple nodes. He highlights the challenges and solutions for managing large-scale GPU clusters, including addressing hardware failures, optimizing communication between GPUs, and the evolving demands of inference workloads alongside training. Kagan also touches on the partnership between Nvidia and Intel, the culture of win-win at Nvidia, and his vision for the future of AI, including AI's potential to advance our understanding of physics.
Sign in to continue reading, translating and more.
Continue