09 Aug 2025

The Future of Evals - Ankur Goyal, Braintrust

AI Engineer

Ankur Goyal from Braintrust discusses the current state and future of evaluations (evals) in AI product development. He highlights that while Braintrust customers run thousands of evals daily, the process remains largely manual. Goyal then introduces "Loop," an AI agent built into Braintrust, which automates the optimization of prompts, datasets, and scorers. He explains that Loop's effectiveness is due to recent advancements in frontier models, particularly Claude IV, which significantly outperforms previous models. Loop allows users to optimize prompts and agents, build better datasets and scorers, and view suggested edits side-by-side within the Braintrust UI. He concludes by inviting users to try Loop and provide feedback, and mentions that Braintrust is hiring.

Outlines

Continue

Preview

How to Get Rich: Every EpisodeNaval

The Future of Evals - Ankur Goyal, Braintrust

AI Engineer

00:00The Evolution of Evals and the Introduction of Loop in Braintrust

The Evolution of Evals and the Introduction of Loop in Braintrust