Ankur Goyal from Braintrust discusses the current state and future of evaluations (evals) in AI product development. He highlights that while Braintrust customers run thousands of evals daily, the process remains largely manual. Goyal then introduces "Loop," an AI agent built into Braintrust, which automates the optimization of prompts, datasets, and scorers. He explains that Loop's effectiveness is due to recent advancements in frontier models, particularly Claude IV, which significantly outperforms previous models. Loop allows users to optimize prompts and agents, build better datasets and scorers, and view suggested edits side-by-side within the Braintrust UI. He concludes by inviting users to try Loop and provide feedback, and mentions that Braintrust is hiring.
Sign in to continue reading, translating and more.
Continue