YouTube17 Apr 2025
20m

Building and evaluating AI Agents — Sayash Kapoor, AI Snake Oil

Podcast cover

AI Engineer

In this presentation, Sayash Kapoor discusses the current state of AI agents, highlighting the gap between ambitious visions and real-world performance. He identifies three key reasons for this discrepancy: the difficulty of evaluating agents, the misleading nature of static benchmarks, and the confusion between capability and reliability. Kapoor argues that current evaluation methods are inadequate for capturing the complexities of agent behavior, especially regarding cost and real-world applicability. He emphasizes the need for multi-dimensional benchmarks, human-in-the-loop validation, and a shift in mindset towards reliability engineering to address the inherent stochasticity of AI agents and ensure their successful deployment.

Outlines

Sign in to continue reading, translating and more.

Continue
 
mindmap screenshot
Preview
preview episode cover
How to Get Rich: Every EpisodeNaval