17 Feb 2026
43m

Evals, Feedback Loops, and the Engineering That Makes AI Work

Podcast cover

AI + a16z

The discussion centers on the interplay between AI development and traditional systems engineering, questioning whether the "bitter lesson" of simply scaling compute power will continue to dominate or if engineering rigor will become essential. Ankur Goyal, founder and CEO of Braintrust, shares his experience on the necessity of well-engineered testing and feedback loops (evals) to manage AI model complexity, even as models improve. Goyal and Martin Casado explore the dynamics between closed and open-source AI models, noting Chinese models exhibit high token usage but lower dollar spend, and discuss the balance between model quality and engineering efficiency, suggesting that enterprises may face a limit in their capacity to absorb ever-improving AI capabilities. The conversation also touches on the surprising finding that SQL outperforms Bash in agent benchmarks, highlighting the value of computer science fundamentals in AI development.

Outlines

Part 1: Background, Core Concepts

Part 2: Market Dynamics, Benchmarks

Part 3: Adoption, Economics

Part 4: Engineering Fundamentals

Sign in to continue reading, translating and more.

Open full episode in Podwise