10 Mar 2026
1h 23m

NVIDIA's AI Engineers: Agent Inference at Planetary Scale and "Speed of Light" — Nader Khalil (Brev), Kyle Kranen (Dynamo)

Podcast cover

Latent Space: The AI Engineer Podcast

The Latent Space Podcast explores the evolving landscape of AI development, developer experience, and the future of inference. Nader Khalil and Kyle Kranen from NVIDIA discuss the acquisition of Brev, a developer tool for GPU access, and NVIDIA's broader strategy in developer experience, emphasizing the importance of understanding the end-user. They introduce Dynamo, a data center scale inference engine designed to accelerate inference by leveraging techniques like disaggregation, and touch on the concept of "SOL" (Speed of Light) as a method to create urgency and understand the theoretical limits of project timelines. The conversation also covers the shift towards hardware-model co-design, the challenges of long-context models, and the potential of agents in streamlining workflows, while also addressing security concerns.

Outlines

Part 1: Security and Agent Capabilities

Part 2: NVIDIA, Brev, and Developer Experience

Part 3: The SOL Philosophy and Internal Culture

Part 4: Dynamo and Inference Scaling

Part 5: Context Length and Model Architecture

Part 6: Agents, Security, and Developer Tools

Part 7: Future Outlook and Community

Sign in to continue reading, translating and more.

Open full episode in Podwise