YouTube29 Jul 2025
40m

[Full Workshop] Building Metrics that actually work — David Karam, Pi Labs (fmr Google Search)

Podcast cover

AI Engineer

The podcast features two main speakers, along with contributions from audience members, focusing on the challenges and methodologies of evaluating AI systems, particularly Large Language Models (LLMs). The discussion covers the importance of evals, the difficulties in defining metrics, the labor-intensive nature of the process, and the need for custom, subjective evaluations. The speakers share their experiences from Google, emphasizing the significance of benchmarking, establishing relevant metrics, and calibrating these metrics with human and user data. They introduce the concept of a scoring system that breaks down complex evaluations into simpler, more objective signals, and they also detail a hands-on workshop where attendees can experiment with building and refining their own scoring systems using a copilot tool, Google Sheets integration, and Python code in Colab.

Outlines

Sign in to continue reading, translating and more.

Open full episode in Podwise