Lowering the Cost of Intelligence With NVIDIA's Ian Buck - Ep. 284 | NVIDIA AI Podcast

In this episode of the NVIDIA AI Podcast, host Noah Kravitz interviews Ian Buck, NVIDIA's VP of Hyperscale and High-Performance Computing, about mixture-of-experts (MOE), an AI architecture that improves efficiency and reduces costs. Buck explains MOE's function, comparing it to a human brain that activates only necessary neurons for specific tasks, and how it divides AI models into smaller, specialized "experts." The discussion covers the "DeepSeek moment" that popularized MOE, the symbiotic relationship between AI hardware and models, and NVIDIA's role in co-designing systems like the GB200 with NVLink to optimize communication between experts, ultimately lowering the cost per token and advancing AI capabilities across various applications. Buck recommends GTC for listeners seeking more information.

Outlines

Sign in to continue reading, translating and more.

Continue

Lowering the Cost of Intelligence With NVIDIA's Ian Buck - Ep. 284

NVIDIA AI Podcast

Introduction to Mixture-of-Experts (MOE) Architecture

The Mechanics and Benefits of MOE

Tokenomics and the Symbiotic Relationship Between AI Hardware and Models

NVIDIA's Systems and the Hidden Costs of MOE

The Future of AI Models and NVIDIA's Role

Lowering the Cost of Intelligence With NVIDIA's Ian Buck - Ep. 284

NVIDIA AI Podcast

00:00Introduction to Mixture-of-Experts (MOE) Architecture

Introduction to Mixture-of-Experts (MOE) Architecture

05:39The Mechanics and Benefits of MOE

The Mechanics and Benefits of MOE

13:09Tokenomics and the Symbiotic Relationship Between AI Hardware and Models

Tokenomics and the Symbiotic Relationship Between AI Hardware and Models

22:37NVIDIA's Systems and the Hidden Costs of MOE

NVIDIA's Systems and the Hidden Costs of MOE

31:18The Future of AI Models and NVIDIA's Role

The Future of AI Models and NVIDIA's Role